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DESCRIPTION 
A METHOD FOR ANALYZING POLYNUCLEOTIDES 

5 FIELD OF THE INVENTION 

The present invention relates generally to organic chemistry, analytical 
chemistry, biochemistry, molecular biology, genetics, diagnostics and medicine. In 
particular, it relates to a method for analyzing polynucleotides; i.e., for determining 
the complete nucleotide sequence of a polynucleotide, for detecting variance in the 
10 nucleotide sequence between related polynucleotides and for genotyping DNA. 

BACKGROUND OF THE INVENTION 

The following is offered as background information only and is not intended nor 
admitted to be prior art to the present invention. 

1 5 DNA is the carrier of the genetic information of all living cells. An organism's 

genetic and physical characteristics, its genotype and phenotype, respectively, are 
controlled by precise nucleic acid sequences in the organism's DNA. The sum total of all 
of the sequence information present in an organism's DNA is termed the organism's 
"genome." The nucleic acid sequence of a DNA molecule consists of a linear polymer of 

20 four "nucleotides." The four nucleotides are tripartite molecules, each consisting of (1 ) 
one of the four heterocyclic bases, adenine (abbreviated "A"), cytosine ("C"), guanine 
("G") and thymine ("T"); (2) the pentose sugar derivative 2-deoxyribose which is bonded 
by its 1-carbon atom to a ring nitrogen atom of the heterocyclic bases; and (3) a 
monophosphate monoester formed between a phosphoric acid molecule and the 5'- 

25 hydroxy group of the sugar moiety. The nucleotides polymerize by the formation of 

diesters between the 5'-phosphate of one nucleotide and the 3'-hydroxy group of another 
nucleotide to give a single strand of DNA. In nature, two of these single strands interact 
by hydrogen bonding between complementary nucleotides, A being complementary with 
T and C being complementary with G, to form "base-pairs" which results in the formation 

30 of the well-known DNA "double helix" of Watson and Crick. RNA is similar to DNA 

except that the base thymine is replaced with uracil ("U") and the pentose sugar is ribose 
itself rather than deoxyribose. In addition, RNA exists in nature predominantly as a 
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single strand; i.e., two strands do not normally combine to form a double helix. 

When referring to sequences of nucleotides in a polynucleotide, it is customary to 
use the abbreviation for the base; i.e., A, C, G, and T (or U) to represent the entire 
nucleotide containing that base. For example, a polynucleotide sequence denoted as 
5 "ACG" means that an adenine nucleotide is bonded through a phosphate ester linkage to 
a cytosine nucleotide which is bonded through another phosphate ester linkage to a 
guanine nucleotide. If the polynulceotide being described is DNA, then it is understood 
that "A" refers to an adenine nucleotide which contains a deoxyribose sugar. If there is 
any possibility of ambiguity, the "A" of a DNA molecule can be designated "deoxyA" or 

10 simply "dA." The same is true for C and G. Since T occurs only in DNA and not RNA, 
there can be no amibiguity so there is no need to refer to deoxyT or dT. 

As a rough approximation, it can be said that the number of genes an organism 
has is proportional to the organism's phenotypic complexity; i.e., the number of genome 
products necessary to replicate the organism and allow it to function. The human 

15 genome, presently considered one of the most complex, consists of approximately 

60,000 - 100,000 genes and about three billion three hundred million base pairs. Each 
of these genes codes for an RNA, most of which in turn encodes a particular protein 
which performs a specific biochemical or structural function. A variance, also known as 
a polymorphism or mutation, in the genetic code of any one of these genes may result in 

20 the production of a gene product, usually a protein or an RNA, with altered biochemical 
activity or with no activity at all. This can result from as little change as an addition, 
deletion or substitution (transition or transversion) of a single nucleotide in the DNA 
comprising a particular gene which is sometimes referred to as a "single nucleotide 
polymorphism" or "SNP. The consequence of such a mutation in the genetic code 

25 ranges from harmless to debilitating to fatal. There are presently over 6700 human 

disorders believed to have a genetic component. For example, hemophilia, Alzheimer's 
disease, Huntington's disease, Duchernne muscular dystrophy and cystic fibrosis are 
known to be related to variances in the nucleotide sequence of the DNA comprising 
certain genes. In addition, evidence is being amassed suggesting that changes in 

30 certain DNA sequences may predispose an individual to a variety of abnormal conditions 
such as obesity, diabetes, cardiovascular disease, central nervous system disorders, 
auto-immune disorders and cancer. Variations in DNA sequence of specific genes have 
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also been implicated in the differences observed among patients in their responses to, 
for example, drugs, radiation therapy, nutritional status and other medical interventions. 
Thus, the ability to detect DNA sequence variances in an organism's genome is an 
important aspect of the inquiry into relationships between such variances and medical 
5 disorders and responses to medical interventions. Once an association has been 
established, the ability to detect the variance(s) in the genome of a patient can be an 
extremely useful diagnostic tool. It may even be possible, using early variance 
detection, to diagnose and potentially treat, or even prevent, a disorder before the 
disorder has physically manifested itself. Furthermore, variance detection can be a 

10 valuable research tool in that it may lead to the discovery of genetic bases for disorders 
the cause of which were hitherto unknown or thought to be other than genetic. Variance 
detection may also be useful for guiding the selection of an optimal therapy where there 
is a difference in response among patients to one or more proposed therapies. 

While the benefits of being able to detect variances in the genetic code are clear, 

15 the practical aspects of doing so are daunting: it is estimated that sequence variations in 
human DNA occur with a frequency of about 1 in 100 nucleotides when 50 to 1 00 
individuals are compared. Nickerson, D.A., Nature Genetics . 1998, 223-240. This 
translates to as many as thirty million variances in the human genome. Not all, in fact 
very few, of these variances have any measurable effect on the physical well-being of 

20 humans. Detecting these 30 million variances and then determining which of them are 
relevant to human health is clearly a formidable task. 

In addition to variance detection, knowledge of the complete nucleotide sequence 
of an organism's genome would contribute immeasurably to the understanding of the 
organism's overall biology, i.e., it would lead to the identification of every gene product, 

25 its organization and arrangement in the organism's genome, the sequences required for 
controlling gene expression (i.e., production of each gene product) and replication. In 
fact, the quest for such knowledge and understanding is the raison d'etre for the Human 
Genome Project, an international effort aimed at sequencing the entire human genome. 
Once the sequence of a single genome is available, whatever the organism, it then 

30 becomes useful to obtain the partial or complete sequence of other organisms of that 
species, particularly those organisms within the species that exhibit different 
characteristics, in order to identify DNA sequence differences that correlate with the 
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different characteristics. Such different characteristics may include, for microbial 
organisms, pathogenicity on the negative side or the ability to produce a particular 
polymer or to remediate pollution on the positive side. A difference in growth rate, 
nutrient content or pest resistance are potential differences which might be observed 
5 among plants. Even among human beings, a difference in disease susceptibility or 
response to a particular therapy might relate to a genetic, i.e., DNA sequence, variation. 
As a result of the enormous potential utility to be realized from DNA sequence 
information, in particular, identification of DNA sequence variances between individuals 
of the same species, the demand for rapid, inexpensive, automated DNA sequencing 
10 and variance detection procedures can be expected to increase dramatically in the 
future. 

Once the DNA sequence of a DNA segment; e.g., a gene, a cDNA or, on a larger 
scale, a chromosome or an entire genome, has been determined, the existence of 
sequence variances in that DNA segment among members of the same species can be 

15 explored. Complete DNA sequencing is the definitive procedure for accomplishing this 
task. Thus, it is possible to determine the complete sequence of a copy of a DNA 
segment obtained from a different member of the specie and simply compare that 
complete sequence to the one previously obtained. However, current DNA sequencing 
technology is costly, time consuming and, in order to achieve high levels of accuracy, 

20 must be highly redundant. Most major sequencing projects require a 5- to 10-fold 
coverage of each nucleotide to reach an acceptable error rate of 1 in 2,000 to 1 in 
10,000 bases. In addition, DNA sequencing is an inefficient way to detect variances. 
For example, a variance between any two copies of a gene, for example when two 
chromosomes are being compared, may occur as infrequently as once in 1 ,000 or more 

25 bases. Thus, only a small portion of the sequence is of interest, that in which the 
variance exists. However, if full sequencing is employed, a tremendous number of 
nucleotides have to be sequenced to arrive at the desired information involving the 
aforesaid small portion. For example, consider a comparison of ten versions of a 3,000 
nucleotide DNA sequence for the purpose of detecting, say, four variances among 

30 them. Even if only a 2-fold redundancy is employed (each strand of the double-stranded 
3,000 nucleotide DNA segment from each individual is sequenced once), 60,000 
nucleotides would have to be sequenced (10 X 3,000 X 2). In addition, it is more than 
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likely that problem areas will be encountered in the sequencing requiring additional runs 
with new primers; thus, the project could engender the sequencing of as many as 
100,000 nucleotides to determine four variances. A variety of procedures have been 
developed over the past 15 years to identify sequence differences and to provide some 
5 information about the location of the variant sites (Table 1 ). Using such a procedure, it 
would only be necessary to sequence four relatively short portions of the 3000 nt 
(nucleotide) sequence. Furthermore, only a few samples would have to be sequenced in 
each region because each variance produces a characteristic change (Table 1) so, if, for 
example, 22 of 50 samples exhibit a such a characteristic change with a variation 

10 detection procedure, then sequencing as few as four samples of the 22 would provide 
information on the other 18. The length of the segments that require sequencing could, 
depending on the variance detection procedure employed, be as short as 50-100 nt. 
Thus, the scale of the sequencing project could be reduced to: 4 (sites) X 50 (nt per site) 
X 2 (strands from each individual) X 2 (individuals per site) or only about 800 

15 nucleotides. This amounts to about 1% of the sequencing required in the absence of a 
preceding variance detection step. 

As presently practiced, the technique for determining the full nucleotide sequence 
of a polynucleotide and that for detecting previously unknown variances or mutations in 
related polynucleotides ends up being the same; that is, even when the issue is the 

20 presence or absence of a single nucleotide variance between related polynucleotides, 
the complete sequences of at least a segment of the related polynucleotides is 
determined and then compared. The only difference is that a variance detection 
procedure such as those described in Table 1 may be employed as a first step to reduce 
the amount of complete sequencing necessary in the detection of unknown variances. 



WO 00/18967 



6 



PCT/US99/22988 



TABLE 1 




o .2 u 



ill 

In 
Irr 



■5 

111 
o d g 

a* I 

ii| 



lit 



^1 "8 
- « - 

O o o 

III 



.g.f >, 
> S g 
« 5 <= 

8 s » 

lb 

i.2 » 

ill 
•5 s § 

* 0 s 
» >.<£ 

I 8 £ 

I '51 

e B « 

!? « s 
« "2 . 

■a .s 3 
c Jj ^ 

fi ft C 

Mi 

- a 8 

« ft 6 

I I s 



WO 00/18967 



PCT/US99/22988 



The two classical methods for carrying out complete nucleotide sequencing 
are the Maxam and Gilbert chemical procedure (Proc. Nat. Acad. Sci. USA . 74, 560- 
564 (1977)) and the Sanger, et al., chain-terminating procedure Proc. Nat. Acad. 
Sci. USA . 74, 5463-5467 (1977)). 
5 The Maxam-Gilbert method of complete nucleotide sequencing involves end- 

labeling a DNA molecule with, for example, 32 P, followed by one of two discrete 
reaction sequences involving two reactions each; i.e., four reactions overall. One of 
these reaction sequences involves the selective methylation of the purine 
nucleotides guanine (G) and adenine (A) in the polynucleotide being investigated 

10 which, in most instances, is an isolated naturally-occurring polynucleotide such as 
DNA. The N7 position of guanine methylates approximately five times as rapidly as 
the N3 position of adenine. When heated in the presence of aqueous base, the 
methylated bases are lost and a break in the polynucleotide chain occurs. The 
reaction is more effective with methylated guanine than with methylated adenine so, 

15 when the reaction product is subjected to electrophoresis on polyacrylamide gel 
plates, G cleavage ladders are predominant. Under acidic conditions, on the other 
hand, both methylated bases are removed effectively. Treatment by piperidine 
cleaves DNA at these abasic sites, generating sequencing ladders that correspond 
toA + G. 

20 Thus, four chemical reactions followed by electrophoretic analysis of the 

resulting end-labeled ladder of cleavage products will reveal the exact nucleotide 
sequence of a DNA molecule. It is key to the Maxam-Gilbert sequencing method 
that only partial cleavage, on the order of 1-2 % at each susceptible position, occurs. 
This is because electrophoresis separates fragments by size. To be meaningful, the 

25 fragments produced should represent, on the average, a single modification and 
cleavage per molecule. Then, when the fragments of all four reactions are aligned 
according to size, the exact sequence of the target DNA can be determined. 

The Sanger method for determining complete nucleotide sequences consists 
of preparing four series of base-specifically chain-terminated labeled DNA fragments 

30 by enzymatic polymerization. As in the Maxam-Gilbert procedure, four separate 
reactions can be performed. In the Sanger method each of the four reaction 
mixtures contains the same oligonucleotide template (either a single- or a double- 
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stranded DNA), the four nucleotides, A, G, C and T (one of which may be labeled), 
a polymerase and a primer, the polymerase and primer being present to effect the 
polymerization of the nucleotides into a complement of the template oligonucleotide. 
To one of the four reaction mixtures is added an empirically determined amount of 
5 the dideoxy derivative of one of the nucleotides. A small amount of the dideoxy 
derivative of one of the remaining three nucleotides is added to a second reaction 
mixture, and so on, resulting in four reaction mixtures each containing a different 
dideoxy nucleotide. The dideoxy derivatives, by virtue of their missing 3'-hydroxyl 
groups, terminates the enzymatic polymerization reaction upon incorporation into the 

10 nascent oligonucleotide chain. Thus, in one reaction mixture, containing, say, 

dideoxyadenosine triphosphate (ddATP), a series of oligonucleotide fragments are 
produced all ending in ddA which when resolved by electrophoresis produce a series 
of bands corresponding to the size of the fragment created up to the point that the 
chain-terminating ddA became incorporated into the polymerization reaction. 

15 Corresponding ladders of fragments can be obtained from each of the other reaction 
mixtures in which the oligonucleotide fragments end in C, G and T. The four sets of 
fragments create a "sequence ladder," each rung of which represents the next 
nucleotide in the sequence of bases comprising the subject DNA. Thus, the exact 
nucleotide sequence of the DNA can simply be read off the electrophoresis gel plate 

20 after autoradiography or computer analysis of chromatograms in the case of an 
automated DNA sequencing instrument. As mentioned above, dye-labelled chain 
terminating dideoxynucleotides and modified polymerases that efficiently incorporate 
modified nucleotides are an improved method for chain-terminating sequencing. 
Both the Maxam-Gilbert and Sanger procedures have their shortcomings. 

25 They are both time-consuming, labor-intensive (particularly with regard to the 
Maxam-Gilbert procedure which has not been automated like the Sanger 
procedure), expensive (e.g., the most optimized versions of the Sanger procedure 
require very expensive reagents) and require a fair degree of technical expertise to 
assure proper operation and reliable results. Furthermore, the Maxam-Gilbert 

30 procedure suffers from a lack of specificity of the modification chemistry which can 
result in artifactual fragments resulting in false ladder readings from the gel plate. 
The Sanger method, on the other hand, is susceptible to template secondary 
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structure formation which can cause interference in the polymerization reaction. 
This causes terminations of the polymerization at sights of secondary struction 
(called "stops") which can result in erroneous fragments appearing in the sequence 
ladder rendering parts of the sequence unreadable, although this problem is 
5 ameliorated by the use of dye labelled dideoxy terminator. Furthermore, both 
sequencing methods are is susceptible to "compressions," another result of DNA 
secondary structure which can affect fragment mobility during electrophoresis 
thereby rendering the sequence ladder unreadable or subject to erroneous 
interpretation in the vicinity of the secondary structure. In addition, both methods 

1 0 are plagued by uneven intensity of the ladder and by non-specific background 
interference. These concerns are magnified when the issue is variance detection. 
In order to discern a single nucleotide variance, the procedure employed must be 
extremely accurate, a "mistake" in reading one nucleotide can result in a false 
positive; i.e., an indication of a variance where none exists. Neither the Maxam- 

1 5 Gilbert nor the Sanger procedures are capable of such accuracy in a single run. In 
fact, the frequency of errors in a "one pass" sequencing experiment is equal to or 
greater than 1%, which is on the order of ten times the frequency of actual DNA 
variances when any two versions of a sequence are compared. The situation can be 
ameliorated somewhat by performing multiple runs (usually in the context of a 

20 "shotgun" sequencing procedure) for each polynucleotide being compared, but this 
simply increases cost in terms of equipment, reagents, manpower and time. The 
high cost of sequencing becomes even less acceptable when one considers that it is 
often not necessary when looking for nucleotide sequence variances among related 
polynucleotides to determine the complete sequence of the subject polynucleotides 

25 or even the exact nature of the variance (although, as will be seen, in some 

instances even this is discernable using the method of this invention); detection of 
the variance alone may be sufficient. 

While not avoiding all of the problems associated with the Maxam-Gilbert and 
Sanger procedures, several techniques have been devised to at least make one or 

30 the other of the procedures more efficient. One such approach has been to develop 
ways to circumvent slab gel electrophoresis, one of the most time-consuming steps 
in the procedures. For instance, in U.S. Patent Nos. 5,003,059 and 5,174,962, the 
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Sanger method is employed; however, the dideoxy derivative of each of the 
nucleotides used to terminate the polymerization reaction is uniquely tagged with an 
isotope of sulfur, 32 S, ^S, or 36 S. Once the polymerization reactions are 
complete, the chain terminated sequences are separated by capillary zone 
5 electrophoresis, which, compared to slab gel electrophoresis, increases resolution, 
reduces run time and allows analysis of very small samples. The separated chain 
terminated sequences are then combusted to convert the incorporated isotopic 
sulfur to isotopic sulfur dioxides ( 32 S0 2 33 S0 2 , M S0 2 and 36 S0 2 ). The isotopic sulfur 
dioxides are then subjected to mass spectrometry. Since each isotope of sulfur is 

10 uniquely related to one of the four sets of base-specifically chain terminated 

fragments, the nucleotide sequence of the subject DNA can be determined from the 
mass spectrogram. 

Another method, disclosed in U.S. Patent No. 5,580,733, also incorporates 
the Sanger technique but eliminates gel electrophoresis altogether. The method 

1 5 involves taking each of the four populations of base-specific chain-terminated 

oligonucleotides from the Sanger reactions and forming a mixture with a visible laser 
light absorbing matrix such as 3-hydroxypicolinic acid. The mixtures are then 
illuminated with visible laser light and vaporized, which occurs without further 
fragmentation of the chain-terminated nucleic acid fragments. The vaporized 

20 molecules which are charged are then accelerated in an electric filed and the mass 
to charge (m/z) ratio of the ionized molecules determined by time-of-flight mass 
spectrometry (TOF-MS). The molecular weights are then aligned to determine the 
exact sequence of the subject DNA. By measuring the mass difference between 
successive fragments in each of the mixtures, the lengths of fragments terminating 

25 in A, G, C or T can then be inferred. A significant 

limitation of current MS instruments is that polynucleotide fragments greater than 
100 nucleotides in length (with many instruments, 50 nucleotides) cannot be 
efficiently detected in routine use, especially if the fragments are part of a complex 
mixture. This severe limitation on the size of fragments that can be analyzed has 

30 limited the development of polynucleotide analysis by MS. Thus, there is a need for 
a procedure that adapts large polynucleotides, such as DNA, to the capabilities of 
current MS instruments. The present invention provides such a procedure. 
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A further approach to nucleotide sequencing is disclosed in U.S. Patent No. 
5,547,835. Again, the starting point is the Sanger sequencing strategy. The four 
base specific chain- terminated series of fragments are "conditioned" by, for 
example, purification, cation exchange and/or mass modification. The molecular 
5 weights of the conditioned fragments are then determined by mass spectrometry 
and the sequence of the starting nucleic acid is determined by aligning the base- 
specifically terminated fragments according to molecular weight. 

Each of the above methods involves complete Sanger sequencing of a 
polynucleotide prior to analysis by mass spectrometry. To detect genetic mutations; 

10 i.e., variances, the complete sequence can be compared to a known nucleotide 
sequence. Where the sequence is not known, comparison with the nucleotide 
sequence of the same DNA isolated from another of the same organisms which 
does not exhibit the abnormalities seen in the subject organism will likewise reveal 
mutations. This approach, of course, requires running the Sanger procedure twice; 

1 5 i.e., eight separate reactions. In addition, if a potential variance is detected, the 
entire procedure would in most instances be run again, sequencing the opposite 
strand using a different primer to make sure that a false positive had not been 
obtained. When the specific nucleotide variance or mutation related to a particular 
disorder is known, there are a wide variety of known methods for detecting a 

20 variance without complete sequencing. For instance, U. S. Patent No. 5,605,798 
describes such a method. The method involves obtaining a nucleic acid molecule 
containing the target sequence of interest from a biological sample, optionally 
amplifying the target sequence, and then hybridizing the target sequence to a 
detector oligonucleotide which is specifically designed to be complementary to the 

25 target sequence. Either the detector oligonucleotide or the target sequence is 
"conditioned" by mass modification prior to hybridization. Unhybridized detector 
oligonucleotide is removed and the remaining reaction product is volatilized and 
ionized. Detection of the detector oligonucleotide by mass spectrometry indicates 
the presence of the target nucleic acid sequence in the biological sample and thus 

30 confirms the diagnosis of the variance related disorder. 

Variance detection procedures can be divided into two general categories 
although there is a considerable degree of overlap. One category, the variance 
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discovery procedures, is useful for examining DNA segments for the existence, 
location and characteristics of new variances. To accomplish this, variance 
discovery procedures may be combined with DNA sequencing. 

The second group of procedures, variance typing (sometimes referred to as 
5 genotyping) procedures, are useful for repetitive determination of one or more 

nucleotides at a particular site in a DNA segment when the location of a variance or 
variances has previously been identified and characterized. In this type of analysis, 
it is often possible to design a very sensitive test of the status of a particular 
nucleotide or nucleotides. This technique, of course, is not well suited to the 

1 0 discovery of new variances. 

As note above, Table 1 is a list of a number of existing techniques for 
nucleotide examination. The majority of these are used primarily in new variance 
determination. There are a variety of other methods, not shown, for gene typing. 
Like the Maxam-Gilbert and Sanger sequencing procedures, these techniques are 

15 generally time-consuming, tedious and require a relatively high skill level to achieve 
the maximum degree of accuracy possible from each procedure. Even then, some 
of the techniques listed are, even at their best, inherently less accurate than would 
be desirable. 

The methods of Table 1, though primarily devised for variance discovery, can 
20 also be used when a variant nucleotide has already been identified and the goal is to 
determine its status in one or more unknown DNA samples (variance typing or 
genotyping). Some of the methods that have been developed specifically for 
genotyping include (1) primer extension methods in which dideoxynucleotide 
termination of the primer extension reaction occurs at the variant site generating 
25 extension products of different length or with different terminal nucleotides, which 
can then be determined by electrophoresis, mass spectrometry or fluorescence in a 
plate reader; (2) hybridization methods in which oligonucleotides corresponding to 
the two possible sequences at a variant site are attached to a solid surface and 
hybridized with probes from the unknown sample; (3) restriction fragment length 
30 polymorphism analysis, wherein a restriction endonuclease recognition site includes 
the polymorphic nucleotide in such a manner that the site is cleavable with one 
variant nucleotide but not another; (4) methods such as "TaqMan" involving 
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differential hybridization and consequent differential 5' endonuclease digestion of 
labelled oligonucleotide probes in which there is fluorescent resonance energy 
transfer (FRET) between two fluors on the probe that is abrogated by nuclease 
digestion of the probe; (5) other FRET based methods involving labelled 
5 oligonucleotide probes called molecular beacons which exploit allele specific 

hybridization; (6) ligation dependent methods that require enzymatic ligation of two 
oligonucleotides across a polymorphic site that is perfectly matched to only one of 
them; and, (7) allele specific oligonucleotide priming in a polymerase chain reaction 
(PCR). U. Landegren, et al., 1998 . Reading Bits of Genetic Information: Methods for 

10 Single-nucleotide Polymorphism Analysis, Genome Research 8(8):769-76. 

When complete sequencing of large templates such as the entire genome of 
a virus, a bacterium or a eukaryote (e.g., higher organisms including man) or the 
repeated sequencing of a large DNA region or regions from different strains or 
individuals of a given species for purposes of comparison is desired, it becomes 

1 5 necessary to implement strategies for making libraries of templates for DNA 

sequencing. This is because conventional chain terminating sequencing (i.e., the 
Sanger procedure) is limited by the resolving power of the analytical procedure used 
to create the nucleotide ladder of the subject polynucleotide. For gels, this resolving 
power is approximately 500 - 800 nt at a time. For mass spectrometry, the limitation 

20 is the length of a polynucleotide which can be efficiently vaporized prior to detection 
in the instrument. Although larger fragments have been analyzed by highly 
specialized procedures and instrumentation, presently this limit is approximately 50 
- 60 nt. However, in large scale sequencing projects such as the Human Genome 
Project, "markers" (DNA segments of known chromosomal location whose presence 

25 can be relatively easily ascertained by the polymerase chain reaction (PCR) 

technique and which, therefore, can be used as a point of reference for mapping 
new areas of the genome) are currently about 100 kilobases (Kb) apart. The 
markers at100 Kb intervals must be connected by efficient sequencing strategies. If 
the analytical method used is gel electrophoresis, then to sequence a 100 kb stretch 

30 of DNA would require hundreds of sequencing reactions. A fundamental question 
which must be addressed is how to divide up the 100 kB segment (or whatever size 
is being dealt with) to optimize the process; i.e., to minimize the number of 
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sequencing reactions and sequence assembly work necessary to generate a 
complete sequence with the desired level of accuracy. A key issue in this regard is 
how to initially fragment the DNA in such a manner that the fragments, once 
sequenced, can be correctly reassembled to recreate the full length target DNA. 
5 Presently, two general approaches provide both sequence-ready fragments and the 
information necessary to recombine the sequences into the full-length target DNA: 
"shotgun sequencing" (see, e.g., Venter, J. C, et a!., Science . 1998, 280:1540-1542; 
Weber, J. L. and Myers, E. W., Genome Research . 1997, 7:401-409; Andersson, B. 
et al., DNA Sequence . 1997, 7:63-70) and "directed DNA sequencing" (see, e.g., 
10 Voss, H., et al., Biotechniaues . 1993, 15:714-721; Kaczorowski, T., et al., Anal. 
Biochem .. 1994, 221:127-135; Lodhi, M. A., et al., Genome Research . 1996, 6:10- 
18). 

Shotgun sequencing involves the creation of a large library of random 
fragments or "clones" in a sequence-ready vector such as a plasmid or phagemid. 

1 5 To arrive at a library in which all portions of the original sequence are relatively 

equally represented, DNA which is to be shotgun sequenced is often fragmented by 
physical procedures such as sonication which has been shown to produce nearly 
random fragmentation. Clones are then selected at random from the shotgun library 
for sequencing. The complete sequence of the DNA is then assembled by 

20 identifying overlapping sequences in the short (approx. 500 nt) shotgun sequences. 
In order to assure that the entire target region of the DNA is represented among the 
randomly selected clones and to reduce the frequency of errors (incorrectly assigned 
overlaps), a high degree of sequencing redundancy is necessary; for example, 7 to 
10-fold. Even with such high redundancy, additional sequencing is often required to 

25 fill gaps in the coverage. Even then, the presence of repeat sequences such as Alu 
(a 300 base-pair sequence which occurs in 500,000 - 1 ,000,000 copies per haploid 
genome) and LINES ("Long INterspersed DNA sequence Elements" which can be 
7,000 bases long and may be present in as many as 100,000 copies per haploid 
genome), either of which may occur in different locations of multiple clones, can 

30 render DNA sequence re-assembly problematic. For instance, different members of 
these sequence families can be over 90% identical which can sometimes make it 
very difficult to determine sequence relationships on opposite sides of such repeats. 
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Figure X illustrates the difficulties of the shotgun sequencing approach in a 
hypothetical 10 kb sequence modeled after the sequence reported in Martin- 
Gallardo, et al., Nature Genetics, (1992) 1:34-39. 

Directed DNA sequencing, the second general approach, also entails making 
5 a library of clones, often with large inserts (e.g., cosmid, P1 , PAC or BAC libraries). 
In this procedure, the location of the clones in the region to be sequenced is then 
mapped to obtain a set of clones that constitutes a minimum-overlap tiling path 
spanning the region to be sequenced. Clones from this minimal set are then 
sequenced by procedures such as "primer walking" (see, e.g., Voss, supra ). In this 

10 procedure, the end of one sequence is used to select a new sequencing primer with 
which to begin the next sequencing reaction, the end of the second sequence is 
used to select the next primer and so on. The assembly of a complete DNA is 
easier by direct sequencing and less sequencing redundancy is required since both 
the order of clones and the completeness of coverage is known from the clone map. 

15 On the other hand, assembling the map itself requires significant effort. 

Furthermore, the speed with which new sequencing primers can be synthesized and 
the cost of doing so is often a limiting factor with regard to primer walking. While a 
variety of methods for simplifying new primer construction have aided in this 
process (see, e.g. Kaczorowski, et al. and Lodhi, et al. , supra ), directed DNA 

20 sequencing remains a valuable but often expensive and slow procedure. 

Most large-scale sequencing projects employ aspects of both shotgun 
sequencing and directed sequencing. For example, a detailed map might be made 
of a large insert library (e.g., BACs) to identify a minimal set of clones which gives 
complete coverage of the target region but then sequencing of each of the large 

25 inserts is carried out by a shotgun approach; e.g., fragmenting the large insert and 
re-cloning the fragments in a more optimal sequencing vector (see, e.g., Chen, C. 
N., Nucleic Acids Research . 1996, 24:4034-4041). The shotgun and directed 
procedures are also used in a complementary manner in which specific regions not 
covered by an initial shotgun experiment are subsequently determined by directed 

30 sequencing. 

Thus, there are significant limitations to both the shotgun and directed 
sequencing approaches to complete sequencing of large molecules such as that 
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required in genomic DNA sequencing projects. However, both procedures would 
benefit if the usable read length of contiguous DNA was expanded from the current 
500 - 800 nt which can be effectively sequenced by the Sanger method. For 
example, directed sequencing could be significantly improved by reducing the need 
5 for high resolution maps which could be achieved by longer read lengths which in 
turn would permit greater distances between landmarks. 

A major limitation of current sequencing procedures is the high error rate 
(Kristensen, T., et al, DNA Sequencing . 2:243-346, 1992; Kurshid, F. and Beck, S., 
Analytical Biochemistry . 208:138-143, 1993; Fichant, G. A. and Quentin, Y., Nucleic 

10 Acid Research . 23:2900-2908, 1995). It is well-known that many of the errors 

associated with the Maxam-Gilbert and Sanger procedures are systematic; i.e., the 
errors are not random; rather, they occur repeatedly. To avoid this, two 
mechanistically different sequencing methods may be used so that the systematic 
errors in one may be detected and thus corrected by the second and visa versa. 

15 Since a significant fraction of the cost of current sequencing methods is associated 
with the need for high redundancy to reduce sequencing errors, the use of two 
procedures can reduce the overall cost of obtaining highly accurate DNA sequence. 

The production and/or chemical cleavage of polynucleotides composed of 
ribonucleotides and deoxyribonucleotides has been previously described. In 

20 particular, mutant polymerases that incorporate both ribonucleotides and 

deoxyribonucleotides into a polynucleotide have been described; production of 
mixed ribo- and deoxyribo- containing polynucleotides by polymerization has been 
described; and generation of sequence ladders from such mixed polynucleotides, 
exploiting the well known lability of the ribo sugar to chemical base, has been 

25 described. 

The use of such procedures, however, have been limited to: (i) 
polynucleotides where one ribonucleotide and three deoxyribonucleotides are 
incorporated; (ii) cleavage at ribonucleotides is effected using chemical base, (iii) 
only partial cleavage of the ribonucleotide containing polynucleotides is pursued, 

30 and (iv) the utility of the procedure is confined to production of sequence ladders, 
which are resolved electrophoretically. 
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In addition, the chemical synthesis of polynucleotide primers containing a 
single ribonucleotide, which at a subsequent step is substantially completely cleaved 
by chemical base, has been reported. The size of a primer extension product is then 
determined by mass spectrometry or other methods. 
5 SUMMARY OF THE INVENTION 

It is clear from the foregoing that there exists a need for a simple, low cost, 
rapid, yet sensitive and accurate, method for analyzing polynucleotides such as, 
without limitation, DNA, to determine both complete nucleotide sequences and the 
presence of variance(s). Further, there is a need for methods to enable assembly of 

10 very long DNA sequences across repeat dense regions. The methods of the 
present invention fulfill each of these needs. In general, the present invention 
supplies new methods for genotyping, DNA sequencing and variance detection 
based on specific cleavage of DNA and other polynucleotides modified by enzymatic 
incorporation of chemically modified nucleotides. 

15 Thus, in one aspect, this invention relates to a method for cleaving a 

polynucleotide, comprising: 

a. replacing a natural nucleotide at substantially each point of occurrence 
in a polynucleotide with a modified nucleotide to form a modified polynucleotide 
wherein said modified nucleotide is not a ribonucleotide; 

20 b. contacting said modified polynucleotide with a reagent or reagents 

which cleave(s) the modified polynucleotide at substantially each said point of 
occurrence. 

In another aspect, this invention relates to the above-described method 
for use in detection of variance in nucleotide sequence in related polynucleotides by 
25 the additional steps of: 

c. determining the masses of said fragments obtained from step b; and, 

d. comparing the masses of said fragments with the masses of fragments 
expected from cleavage of a related polynucleotide of known sequence, or 

e. repeating steps a - c with one or more related polynucleotides of 
30 unknown sequence and comparing the masses of said fragments of said 

polynucleotide with the masses of fragments obtained from the related 
polynucleotides. 
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A further aspect of this invention is the use of the first method above whereby 
the nucleotide sequence of a polynucleotide is determined, by the additional steps 
of: 

c. determining the masses of said fragments obtained from step 1 b; 
5 d. repeating steps 1a, 1 b and 1c, each time replacing a different natural 

nucleotide in said polynucleotide with a modified nucleotide until each natural 
nucleotide in said polynucleotide has been replaced with a modified polynucleotide, 
each modified polynucleotide has been cleaved and the masses of the cleavage 
fragments have been determined; and, 
10 e. constructing said nucleotide sequence of said polynucleotide from said 

masses of said first fragments. 

Another aspect of this invention is the use of the first mentioned method 
above whereby a nucleotide known to contain a polymorpism or mutation is 
genotyped, by: 

15 using as the natural nucleotide to be replaced, a nucleotide known to be 

involved in said polymorphism or mutation; 

replacing the natural nucleotide by amplifying the portion of the polynucleotide 
using a modified nucleotide to form a modified polynucleotide; 

cleaving the modified polynucleotide into fragments at each point of 
20 occurrence of the modified nucleotide; 

analyzing the fragments to determine genotype. 

In the method immediately above, analysis of the fragments by 
electrophoresis, mass spectrometry or FRET detection, is an aspect of this 
invention. 

25 Another aspect of this invention is a method for cleaving a polynucleotide, 

comprising: 

a. replacing a first natural nucleotide at substantially each point of 
occurrence in a polynucleotide with a modified nucleotide to form a once modified 
polynucleotide; 

30 b. replacing a second natural nucleotide at substantially each point of 

occurrence in the once modified nucleotide with a second modified nucleotide to 
form a twice modified nucleotide; and, 
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c. contacting said twice modified polynucleotide with a reagent or 
reagents which cleave the twice modified polynucleotide at each point in said twice 
modified polynucleotide where said first modified nucleotide is followed immediately 
by, and linked by a phosphodiester or modified phosphodiester linkage to, said 

5 second modified nucleotide. 

An aspect of this invention is, in the method immediately above, variance in 
nucleotide sequence of related polynucleotides is detected by the additional steps 
of: 

d. determining the masses of said fragments obtained from step c; 

10 e. comparing the masses of said fragments with the masses of fragments 

expected from cleavage of a related polynucleotide of known sequence, or 

f. repeating steps a - d with one or more related polynucleotides of 
unknown sequence and comparing the masses of said fragments with masses of 
fragments obtained from cleavage of the related polynucleotides. 

15 An aspect of this invention is a method for detecting variance in nucleotide 

sequence in related polynucleotides, comprising: 

a. replacing three of four natural nucleotides at substantially each point of 
occurrence in a polynucleotide with three stabilizing modified nucleotides to form a 
modified polynucleotide having one remaining natural nucleotide; 

20 b. cleaving said modified polynucleotide into fragments at substantially 

each point of occurrence of said one remaining natural nucleotide; 

c. determining the masses of said fragments; and, 

d. comparing the masses of said fragments with the masses of fragments 
expected from cleavage of a related polynucleotide of known sequence, or 

25 e. repeating steps a - c with one or more related polynucleotides of 

unknown sequence and comparing the masses of said fragments with masses 

obtained from cleavage of the related polynucleotides. 

Another aspect of this invention is, in the method immediately above, 

replacing the remaining natural nucleotide with a destabilizing modified nucleotide. 
30 A further aspect of this invention is a method for detecting variance in 

nucleotide sequence in related polynucleotides, comprising: 
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a. replacing two or more natural nucleotides at substantially each point of 
occurrence in a polynucleotide with two or more modified nucleotides wherein each 
said modified nucleotide has a different cleaving characteristic from each other of 
said modified nucleotides, to form a modified polynucleotide; 
5 b. cleaving said modified polynucleotide into first fragments at 

substantially each point of occurrence of a first of said two or more modified 
nucleotides; 

c. cleaving said first fragments into second fragments at each point of 
occurrence of a second of said two or more modified nucleotides in said first 

10 fragments; 

d. determining the masses of said first fragments and said second 
fragments; and, 

e. comparing the masses of said first fragments and said second 
fragments with the masses of first fragments and second fragments expected from 

15 the cleavage of a related polynucleotide of known sequence, or 

f. repeating steps a - d with one or more related polynucleotides of 
unknown sequence and comparing the masses of said first and second fragments 
with masses obtained from the cleavage of the related polynucleotides. 

It is an aspect of this invention that, in the above method, the steps are 
20 repeated using a modified nucleotide obtained by replacing different pairs of natural 
nucleotides with modified nucleotides; that is, given four natural nucleotides, 1 , 2, 3, 
and 4, replacing 1 and 3 in one experiment, 2 and 4 in another, 1 and 4 in yet 
another, 2 and 3 in another or 3 and 4 in a final experiment with modified 
nucleotides. 

25 It is an aspect of this invention that the modified polynucleotides obtained by 

the methods just above can be cleaved in a mass spectrometer, in particular, a 

tandem mass spectrometer. 

A further aspect of this invention is a method for determining nucleotide 

sequence in a polynucleotide, comprising: 
30 a. replacing a natural nucleotide at a percentage of points of occurrence 

in a polynucleotide with a modified nucleotide to form a modified polynucleotide 

wherein said modified polynucleotide is not a ribonucleotide; 



WO 00/18967 



PCT/US99/22988 



21 



b. cleaving said modified polynucleotide into fragments at substantially 
each point of occurrence of said modified nucleotide; 

c. repeating steps a and b, each time replacing a different natural 
nucleotide in said polynucleotide with a modified nucleotide; and, 

5 d. determining the masses of said fragments obtained from each 

cleavage; and, 

e. constructing said sequence of said polynucleotide from said masses, 

or 

f. analyzing a sequence ladder obtained from the fragments in step c. 
10 Another aspect of this invention is a method for determining nucleotide 

sequence in a polynucleotide, comprising: 

a. replacing a natural nucleotide at a first percentage of points of 
occurrence in a polynucleotide with a modified nucleotide to form a modified 
polynucleotide wherein said modified nucleotide is not a ribonucleotide; 
15 b. cleaving said modified polynucleotide into fragments at a second 

percentage of said points of occurrence of said modified nucleotide such that the 
combination of said first percentage and said second percentage results in partial 
cleavage of said modified polynucleotide; 

c. repeating steps a and b, each time replacing a different natural 
20 nucleotide in said polynucleotide with a modified nucleotide; 

d. determining the masses of said fragments obtained from each 
cleavage reaction; and, 

e. constructing said sequence of said polynucleotide from said masses or, 

f. analyzing a sequence ladder obtained from said fragments from steps 
25 a and b. 

An aspect of this invention is a method for determining nucleotide sequence 
in a polynucleotide, comprising: 

a. replacing two or more natural nucleotides at substantially each point of 
occurrence in a polynucleotide with two or more modified nucleotides to form a 
30 modified polynucleotide; 
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b. separating said modified polynucleotide into two or more aliquots, the 
number of said aliquots being the same as the number of natural nucleotides 
replaced in step a; and, 

c. cleaving said modified polynucleotide in each said aliquot into 
5 fragments at substantially each point of occurrence of a different one of said 

modified nucleotides such that each of said aliquots contains fragments from 
cleavage at a different modified nucleotide than each other said aliquot; 

d. determining masses of said fragments; and, 

e. constructing said nucleotide sequence from said masses; or, 
10 f. cleaving said modified polynucleotide in each said aliquot into 

fragments at a percentage of points of occurrence of a different modified nucleotide 
such that each of said aliquots contains fragments from cleavage at a different 
modified nucleotide than each other said aliquot; and, 

g. analyzing a sequence ladder obtained from said fragments in step f. 
15 Furthermore, an aspect of this invention is a method for determining 

nucleotide sequence in a polynucleotide, comprising: 

a. replacing a first natural nucleotide at a percentage of points of 
incorporation in a polynucleotide with a first modified nucleotide to form a first 
partially modified polynucleotide wherein said first modified nucleotide is not an 

20 ribonucleotide; 

b. cleaving said first partially modified nucleotide into fragments using 
said cleaving procedure of known cleavage efficiency to form a first set of nucleotide 
specific cleavage products; 

c. repeating steps a and b replacing a second, a third and a fourth 
25 natural nucleotide with a second, third and fourth modified nucleotide to form a 

second, third and fourth partially modified polynucleotide which, upon cleavage, 
afford a second, third and fourth set of nucleotide specific cleavage products; 

d. performing gel electrophoresis on said first, second, third and fourth 
set of nucleotide specific cleavage products to form a sequence ladder; and, 

30 e. reading said sequence of said polynucleotide from said sequence 

ladder. 
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As aspect of this invention is a method for cleaving a polynucleotide during 
polymerization, comprising: 

mixing together four different nucleotides, one or two of which are modified 
nucleotides; and, 

5 two or more polymerases, at least one of which produces or enhances 

cleavage at points where said modified nucleotide is being incorporated or, if two 
modified nucleotides are used, at points wherein said adjacent pair of modified 
nucleotides are being incorporated and are a proper spatial relationship; provided 
that, when only one modified nucleotide is used, it does not contain ribose as its only 
10 modifying characteristic. 

In the method just above, when two modified nucleotides are used, it is an 
aspect of this invention that one of them is a ribonucleotide and one of them is a 5'- 
amino^'.S'-dideoxynucleotide. 

Furthermore, in the method just above using the specific modified 
1 5 nucleotides, it is an aspect of this invention to use two polymerases, one being 

Klenow (exo-) polymerase and one being mutant E710A Klenow (exo-) polymerase. 

In any of the above methods, it is an aspect of this invention that all natural 
nucleotides not being replaced with modified nucleotides can be replaced with mass- 
modified nucleotides. 

20 It is also an aspect of all methods of this invention that the polynucleotide 

being modified is selected from the group consisting of DNA and RNA. 

Another aspect of all of the above methods is detection of said masses of 
said fragments by mass spectrometry. Presently preferred types of mass 
spectrometry are electrospray ionization mass spectrometry and matrix assisted 
25 desorption/ionization mass spectrometry (MALDI). 

In the above methods requiring the generation of a sequence ladder, such 
generation can be accomplished using gel electrophoresis. 

Furthermore, in the above method relating to determining a polynucleotide 
sequence by partially replacing a natural nucleotide with a modified nucleotide, 
30 cleaving said first, second, third and fourth partially modified polynucleotide obtained 
in step "a" with one or more restriction enzymes, labeling the ends of the restriction 
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fragments obtained, and purifying the restriction fragments, prior to performing step 
"b" is another aspect of this invention. 

An aspect of this invention is a method for cleaving a polynucleotide such that 
substantially all fragments obtained from the cleavage carry a label, comprising: 
5 a. replacing a natural nucleotide partially or at substantially each point of 

occurrence in a polynucleotide with a modified nucleotide to form a modified 
polynucleotide; 

b. contacting, in the presence of a phosphine covalently bonded to a label, said 

modified polynucleotide with a reagent or reagents which cleave(s) the modified 
1 0 polynucleotide partially or at substantially each said point of occurrence. 

In a presently preferred embodiment of this invention, the phosphine in the 

above method is tris(carboxyethyl) phosphine (TCEP). 

Also in the method just above, the label is a fluorescent tag or a radioactive 

tag in another aspect of this invention. 
15 It is an aspect of this invention that the above methods can be used for 

diagnosing a genetically-related disease. The methods can also be used as a 

means for obtaining a prognosis of a genetically-related disease or disorder. They 

can also be used to determine if a particular patient is eligible for medical treatment 

by procedures applicable to genetically-related diseases or disorders. 
20 An aspect of this invention is a method for detecting a variance in nucleotide 

sequence in a polynucleotide, for sequencing a polynucleotide or for genotyping a 

polynucleotide known to contain a polymorphism or mutation: 

a. replacing one or more natural nucleotides in said polynucleotide with 

one or more modified nucleotides, one or more of which comprises a modified base; 
25 b. contacting said modified polynucleotide with a reagent or reagents 

which cleave the modified polynucleotide into fragments at site(s) of incorporation of 

said modified nucleotide; 

c. analyzing said fragments to detect said variance, to construct said 

sequence or to genotype said polynucleotide. 
30 The modified base in the above method can be adenine in another aspect of 

this invention. It can also be 7-deaza-7-nitroadenine. 
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A polynulceotide modified as above can be cleaved into fragments by contact 
with chemical base in another aspect of this invention. 

In the above method, cleaving said modified polynucleotide into fragments 
comprises contacting said modified polynucleotide with a phosphine in yet another 
5 aspect of this invention. 

Using TCEP as the phosphine in the above method is another aspect of this 
invention. 

The modified base in the above method can also be modified cytosine such 
as, without limitation, azacytosine or cytosine substituted at the 5-position with an 
10 electron withdrawing group wherein the electron withdrawing group is, also without 
limitation, nitro or halo. 

Once again, polynucleotides modified as noted just above can be cleaved 
with chemical base. 

Inclusion of TCEP in the cleaving reaction immediately above is another 
1 5 aspect of this invention. 

The modified base in the above method can also be modified guanine such 
as, without limitation, 7-methyl- guanine and cleavage can be carried out with 
chemical base. 

The modified guanine is N 2 -al!ylguanine in a further aspect of this invention. 
20 Cleaving this modified guanine by contacting said modified polynucleotide with an 
electrophile, such as, without limitation, iodine, is another aspect of this invention. 

In another aspect of this invention, the modified base in the above method 
can also be modified thymine and modified uracil. A presently preferred 
embodiment of this invention is the use of 5-hydroxyuracil in place of either thymine 
25 or uracil. When 5-hydroxyuracil is used, cleavage is accomplished by: 

a. contacting said polynucleotide with a chemical oxidant; and, then 

b. contacting said polynucleotide with chemical base. 

Another aspect of this invention is a method for detecting a variance in 
nucleotide sequence in a polynucleotide, sequencing a polynucleotide or genotyping 
30 a polynucleotide comprising replacing one or more natural nucleotides in said 
polynucleotide with one or more modified nucleotides, one or more of which 
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comprises a modified sugar with the proviso that, when only one nucleotide is being 
replaced, said modified sugar is not ribose. 

The modified sugar is a 2-ketosugar in a further aspect of this invention. The 
keto sugar can be cleaved with chemical base. 
5 The modified sugar can also be arabinose which is also susceptible to 

chemical base. 

The modified sugar can also be a sugar substituted with a 4-hydroxymethyl 
group which, likewise, renders a polynucleotide susceptible to cleavage with 
chemical base. 

10 On the other hand, the modified sugar can be hydroxycyclopentane, in 

particular 1 -hydroxy- or 2-hydroxycyclopentane. The hydroxycyclopentanes can 
also be cleaved with chemical base. 

The modified sugar can be azidosugar, for example, without limitation, 2'- 
azido, 4'-azido or 4'-azidomethyl sugar. Cleaving an azido sugar can be 
15 accomplished in the presence of TCEP. 

The sugar can also be substituted with a group capable of photolyzing to form 
a free radical such as, without limitation, a phenylselenyl or a t-butylcarboxy group. 
Such groups render the polynucleotide susceptible to cleavage with ultraviolet light. 
The sugar can also be a cyanosugar. In a presently preferred embodiment, 
20 the cyanosugar is 2'-cyanosugar or 2"-cyanosugar. The cyanosugar modified 
polynucleotides can be cleaved with chemical base. 

A sugar substituted with an electron withdrawing group, such as, without 
limitation, fluorine, azido, methoxy or nitro in the 2', 2" or 4' position of the modified 
sugar is another aspect of this invention. These modified sugars render the modified 
25 polynucleotide susceptible to cleavage with chemical base. 

On the other hand, a sugar can be modified by inclusion of an electron- 
withdrawing element in the sugar ring. Nitrogen is an example of such a group. The 
nitrogen can replace the ring oxygen of the sugar or a ring carbon and the resultant 
modified sugar is cleavable with chemical base. 
30 In yet another aspect of this invention, the modified sugar can be a sugar 

containing a mercapto group. The 2' position of the sugar is a presently preferred 
embodiment, such a sugar being cleavable by chemical base. 
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In particular, the modified sugar can be a 5-methylenyl-sugar, a 5'-keto- 
sugar or a 5',5'-difluoro-sugar, all of which are cleavable with chemical base. 

Another aspect of this invention is a method for detecting a variance in 
nucleotide sequence in a polynucleotide, sequencing a polynucleotide or genotyping 
5 a polynucleotide known to contain a polymorphism or mutation comprising replacing 
one or more natural nucleotides in said polynucleotide with one or more modified 
nucleotides, one or more of which comprises a modified phosphate ester. 

The modified phosphate ester can be a phosphorothioate. 

In one embodiment, the sulfur of the phosphorothioate is not covalently 
10 bonded to the sugar ring. In this case, cleaving said modified polynucleotide into 
fragments comprises: 

a. contacting said sulfur of said phosphorothiolate with an alkylating 
agent; and, 

b. then contacting said modified polynucleotide with chemical base. 

15 In a presently preferred embodiment of this invention, the alkylating agent is 

methyl iodide. 

In another aspect of this invention the phosphorothioate containing modified 
polynucleotide can be cleaved into fragments by contacting said sulfur of said 
phosphorothioate with p-mercaptoethanol in a chemical base such as, without 
20 limitation, sodium methoxide in methanol. 

On the other hand, the sulfur atom of said phosphorothiolate can be 
covalently bonded to a sugar ring in another embodiment of this invention. 
Cleavage of a polynucleotide so modified can be carried out with chemical base. 

The modified phosphate ester can also be a phosphoramidate. Cleavage of a 
25 phosphoramidate-containing polynucleotide can be performed using acid. 

It is an aspect of this invention that the modified phosphate ester comprises a 
group selected from the group consisting of alkyl phosphonate and alkyl 
phosphorotriester wherein the alkyl group is preferably methyl. Such a modified 
polynucleotide can also be cleaved with acid. 
30 Another aspect of this invention is a method for detecting a variance in 

nucleotide sequence in a polynucleotide, sequencing a polynucleotide or genotyping 
a polynucleotide known to contain a polymorphism or mutation, comprising replacing 
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a first and a second natural nucleotide in said polynucleotide with a first and a 
second modified nucleotides such that said polynucleotide can be specifically 
cleaved at sites where the first modified nucleotide is followed immediately in the 
modified polynucleotide sequence by said second modified nucleotide. 

5 In the above method, the first modified nucleotide is covalently bonded at its 

5' position to a sulfur atom of a phosphorothioate group and said second modified 
nucleotide, which is modified with a 2'hydroxy group, is contiguous to, and 5' of, said 
first modified nucleotide. This dinucleotide pair is cleavable with chemical base. 
Also in the above method the first modified nucleotide can be covalently 

10 bonded at its 3' position to a sulfur atom of a phosphorothioate group where said 
second modified nucleotide, which is modified with a 2'-hydroxy group, is contiguous 
to and 3' of said first modified nucleotide. This modified nucleotide pair can also be 
cleaved with chemical base. 

It is also an aspect of this invention that, in the above method, said first 

15 modified nucleotide is covalently bonded at its 5' position to a first oxygen atom of a 
phosphorothioate group, said second modified nucleotide is substituted at its 2' 
position with a leaving group and said second modified nucleotide is covalently 
bonded at its 3' position to a second oxygen of said phosphorothioate group. Any 
leaving group can be used, fluorine, chlorine, bromine and iodine are examples. 

20 The polynucleotide so modified can be cleaved with chemical base. Sodium 
methoxide is an example, without limitation, of a useful chemical base. 

In another embodiment of this invention, said first modified nucleotide is 
covalently bonded at its 5" position to a first oxygen atom of a phosphorothioate 
group, said second modified nucleotide is substituted at its 4' position with a leaving 

25 group and said second modified nucleotide is covalently bonded at its 3' position to a 
second oxygen of said phosphorothioate group. Here, again, any good leaving 
group can be used of which fluorine, chlorine, bromine and iodine are non-limiting 
examples. These groups likewise render the modified polynucleotide susceptible to 
cleavage by chemical base such as, without limitation, sodium methoxide. 

30 In a further embodiment of this invention, said first modified nucleotide is 

covalently bonded at its 5' position to a first oxygen atom of a phosphorothioate 
group, said second modified nucleotide is substituted at its 2' position with one or 
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two fluorine atoms and said second modified nucleotide is covalently bonded at its 3' 
position to a second oxygen of said phosphorothioate group. Such a modified 
polynucleotide can be cleaved by 

a. contacting said modified polynucleotide with ethylene sulfide or p- 
5 mercaptoethanol; and then, 

b. contacting said modified polynucleotide with a chemical base such as, 
without limitation, sodium methoxide. 

Another embodiment of this invention has said first modified nucleotide 
covalently bonded at its 5' position to a first oxygen atom of a phosphorothioate 
10 group, said second modified nucleotide substituted at its 2' position with a hydroxy 
group and said second modified nucleotide covalently bonded at its 3' position to a 
second oxygen of said phosphorothioate group. Here, cleavage can be 
accomplished by: 

a. contacting said modified polynucleotide with a metal oxidant; and then, 
15 b. contacting said modified polynucleotide with a chemical base. 

Non-limiting examples of metal oxidants are Cu" and Fe'" and equally non- 
limiting examples of useful bases are dilute hydroxide, piperidine and dilute 
ammonium hydroxide. 

It is also an embodiment of this invention that said first modified nucleotide is 
20 covalently bonded at its 5' position to a nitrogen atom of a phosphoramidate group 
and said second modified nucleotide, which is modified with a 2'-hydroxy group, is 
contiguous to and 5' of said first modified nucleotide. This type of modification 
renders the modified polynucleotide susceptible to acid cleavage. 

A still further embodiment of this invention is one in which said first modified 
25 nucleotide is covalently bonded at its 3' position to a nitrogen atom of a 

phosphoramidate group and said second modified nucleotide, which is modified with 
a 2-hydroxy group, is contiguous to and 3' of said first modified nucleotide. Again, 
such a substitution pattern is cleavable with acid. 

It also may be that said first modified nucleotide is covalently bonded at its 5' 
30 position to an oxygen atom of an alkylphosphonate or an alkylphosphorotriester 
group and said second modified nucleotide, which is modified with a 2'-hydroxy 
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group, is contiguous to said first modified nucleotide. This alternative dinucleotide 
grouping is also cleavable with acid. 

Another cleavable dinucleotide grouping is one in which said first modified 
nucleotide has an electron-withdrawing group at its 4' position and said second 
5 modified nucleotide, which is modified with a 2'-hydroxy group, is contiguous to and 
5' of said first modified nucleotide. Again, cleavage can be accomplished by contact 
with acid. 

Another aspect of this invention is a method for detecting a variance in 
nucleotide sequence in a polynucleotide, for sequencing a polynucleotide or for 
10 genotyping a polynucleotide known to contain a polymorphism or mutation 
comprising: 

a. replacing one or more natural nucleotides in said polynucleotide 
with one or more modified nucleotides wherein each modified nucleotide is modified 
with one or more modifications selected from the group consisting of a modified 

15 base, a modified sugar and a modified phosphate ester, provided that, if only one 
modified nucleotide is used, said modified nucleotide is not a ribonucleotide; 

b. contacting said modified polynucleotide with a reagent or 
reagents which cleave the modified polynucleotide into fragments at site(s) of 
incorporation of said modified nucleotide; 

20 c. analyzing said fragments to detect said variance, to construct 

said sequence or to genotype said polynucleotide. 

An aspect of this invention is compound having the chemical structure: 



O 0 o 

II II II „1 




wherein R 1 is selected from the group consisting of: 
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A compound having the chemical structure: 




10 wherein said "Base" is selected from the group consisting of cytosine, guanine, 
inosine and uracil is another aspect of this invention. 

Another aspect of this invention is a compound having the chemical structure: 



000 




WO 00/18967 



32 



PCT/US99/22988 



wherein said "Base" is selected from the group consisting of adenine, cytosine, 
guanine, inosine and uracil. 

A still further aspect of this invention is a compound having the chemical 
structure: 
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U U V OH u u u 

II II II i Base II II II Base 

O - — P-O-P-O-P-O^ ! | O"— P-O-P-O-P-O— v I 
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wherein said "Base" is selected from the group consisting of adenine, cytosine, 
guanine, inosine, thymine and uracil. 

A polynucleotide comprising a dinucleotide sequence selected from the group 
5 consisting of: 
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wherein each "Base" is independently selected from the group consisting of adenine, 
cytosine, guaninine and thymine; W is an electron withdrawing group; X is a leaving 
group and R is an alkyl, preferrably a lower alkyl, group is also an aspect of this 
5 invention. The electron withdrawing group is selected from the group consisting of F, 
CI, Br, I, N0 2 , C=N, -C(0)OH and OH in another aspect of this invention and, in a 
still further aspect, the leaving group is selected from the group consisting of CI, Br, I 
and OTs. 

An aspect of this invention is a method for synthesizing a polynucleotide 
10 comprising mixing a compound having the chemical structure: 




wherein R 1 is selected from the group consisting of: 



15 
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with adenosine triphosphate, guanosine triphosphate, and thymidine or uridine 
triphosphate in the presence of one or more polymerases. 

A method for synthesizing a polynucleotide comprising mixing a compound 
5 having the chemical structure: 




wherein R 1 is selected from the group consisting of: 
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with adenosine triphosphate, cytidine triphosphate and guanosine triphosphate in 
the presence of one or more polymerases is also an aspect of this invention. 

A method for synthesizing a polynucleotide, comprising mixing a compound 
having the chemical structure: 
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wherein R 1 is selected from the group consisting of: 




with cytidine triphosphate, guanosine triphosphate, and thymidine triphosphate in 
10 the presence of one or more polymerases is a further aspect of this invention. 

An aspect of this invention is a method for synthesizing a polynucleotide, 
comprising mixing a compound having the chemical structure: 
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wherein R 1 is selected from the group consisting of: 

15 
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with adenosine triphosphate, cytidine triphosphate and thymidine triphosphate in the 
presence of one or more polymerases. 

Another aspect of this invention is a method for synthesizing a polynucleotide, 
comprising mixing a compound selected from the group consisting of: 
5 a compound having the chemical structure: 



o o o 





10 wherein said "Base" is selected from the group consisting of cytosine, guanine, 
inosine and uracil; 

a compound having the chemical structure: 
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wherein said "Base" is selected from the group consisting of adenine, cytosine, 
15 guanine, inosine and uracil; and 

a compound having the chemical structure: 
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wherein the "Base" is selected from the group consisting of adenine, cytosine, 
5 guanine or inosine, and thymine or uracil, with whichever three of the four 

nucleosides triphosphates, adenosine triphosphate, cytidine triphosphate, guanosine 
triphosphate and thymidine triphosphate, do not contain said base (or its substitute), 
in the presence of one or more polymerases. 

Another aspect of this invention is a method for synthesizing a polynucleotide, 
10 comprising mixing one of the following pairs of compounds: 



R 3 
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wherein: 

Base, is selected from the group consisting of adenine, cytosine, guanine or inosine, 
and thymine or uracil; 

Base 2 is selected from the group consisting of the remaining three bases which are 
10 not Base, 

R 3 is Cr-P(=0)(0- )-0-P(=0)(0 )-0-P(=0)(0 )-0-; and, 
W is an electron withdrawing group; 
X is leaving group; 
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a second W or X shown in parentheses on the same carbon atom means that a 
single W or X group can be in either position on the sugar or both W or both X 
groups can be present at the same time; and, 
R is a lower alkyl group; 
5 with whichever two of the four nucleoside triphosphates, adenosine triphosphate, 
cytidine triphosphate, guanosine triphosphate and thymidine triphosphate, do not 
contain base-1 or base-2 (or their substitutes), in the presence of one or more 
polymerases. 

An aspect of this invention is a mutant polymerase which is capable of 
10 catalyzing the incorporation of a modified nucleotide into a polynucleotide wherein 
said modified nucleotide is not a ribonucleotide, said polymerase being obtained by 
a process comprising DNA shuffling in another aspect of this invention. 

The DNA shuffling including process can comprise the following steps: 

a. selecting one or more known polymerase(s); 
15 b, performing DNA shuffling; 

c. transforming shuffled DNA into a host cell; 

d. growing host cell colonies; 

e. forming a lysate from said host cell colony; 

f. adding a DNA template containing a detectable reporter sequence, the 
20 modified nucleotide or nucleotides whose incorporation into a polynucleotide is 

desired and the natural nucleotides not being replaced by said modified 
nucleotide(s); and, 

g. examining the lysate for the presence of the detectable reporter. 
The DNA-shuffling including process can also comprise: 

25 a. selecting a known polymerase or two or more known polymerases 

having different sequences or different biochemical properties or both; 

b. performing DNA shuffling; 

c. transforming said shuffled DNA into a host to form a library of 
transformants in host cell colonies; 

30 d. preparing first separate pools of said transformants by plating said host 

cell colonies; 

e. forming a lysate from each said first separate pool host cell colonies; 
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f. removing all natural nucleotides from each said lysate; 

g. combining each said lysate with: 

i. a single-stranded DNA template comprising a sequence 
corresponding to an RNA polymerase promoter followed by a 

5 reporter sequence; 

ii. a single-stranded DNA primer complementary to one end of said 
template; 

iii. the modified nucleotide or nucleotides whose incorporation into 
said polynucleotide is desired; 

10 iv. each natural nucleotide not being replaced by said modified 

nucleotide or nucleotides; 

h. adding RNA polymerase to each said combined lysate; 

i. examining each said combined lysate for the presence of said reporter 
sequence; 

15 j, creating second separate pools of transformants in host cell colonies 

from each said first separate pool of host cell colonies in which the presence of said 
reporter is detected; 

k. forming a lysate from each said second separate pool of host cell 
colonies; 

20 I. repeating steps g, h , I, j, k and I to form separate pools of 

transformants in host cell colonies until only one host cell colony remains which 
contains said polymerase; and, 

m. recloning said polymerase from said one host cell colony into a protein 
expression vector. 

25 A polymerase which is capable of catalyzing the incorporation of a modified 

nucleotide into a polynucleotide, wherein said modified nucleotide is not a 
ribonucleotide obtained by a process comprising cell senescence selection is 
another aspect of this invention. 

The cell senscence selection process can comprise the following steps: 
30 a. mutagenizing a known polymerase to form a library of mutant 

polymerases; 

b. cloning said library into a vector; 
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c. transforming said vector into host cells selected so as to be 
susceptible to being killed by a selected chemical only when said cell is actively 
growing; 

d. adding a modified nulceotide; 
5 e. growing said host cells; 

f. treating said host cells with said selected chemical; 

g. separating living cells from dead cells; and, 

h. isolating said polymerase or polymerases from said living cells. 
Steps d to h of the above method can be repeated one or more times to 

10 refine the selection of the polymerase in another aspect of this invention. 

The cell senescence procedure for obtaining a polymerase can also comprise 
the steps of: 

a. mutagenizing a known polymerase to form a library of mutant 
polymerases; 

15 b. cloning said library of mutant polymerases into a plasmid vector; 

c. transforming with said plasmid vector bacterial cells that, when 
growing, are susceptible to an antibiotic, 

d. selecting transfectants using said antibiotic; 

e. introducing a modified nucleotide, as the corresponding nucleoside 
20 triphosphate, into the bacterial cells; 

f. growing the cells; 

g. adding an antibiotic which will kill bacterial cells that are actively 
growing; 

h. isolating said bacterial cells; 

25 i. growing said bacterial cells in fresh medium containing no antibiotic; 

j. selecting live cells from growing colonies; 

k. isolating said plasmid vector from said live cells; 

I. isolating said polymerase; and, 

m. assaying said polymerase. 
30 Repeating steps c to k of the above process one or more additional times before 
proceeding to step I is another aspect of this invention. 

A polymerase may also be obtained by a process comprising phage display. 
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The phage display process may comprise the steps of: 

a. selecting a DNA polymerase; 

b. expressing said polymerase in a bacteriophage vector as a fusion to 
a bacteriophage coat protein; 

5 c. attaching an oligonucleotide to the surface of the phage; 

d. forming a primer template complex either by addition of a second 
oligonucleotide complementary to the oligonucleotide of c or by formation of a self 
priming complex using intramolecular complementarity of the oligonucleotide of c; 

e. performing a primer extension in the presence of the modified 

10 nucleotide or nucleotides whose incorporation into a polynucleotide is desired, and 
the natural nucleotides not being replaced by said modified nucleotide(s) where 
successful primer extension results in the presence of a detectable reporter 
sequence; 

f. sorting the phage with the detectable reporter from those without the 
1 5 detectable reporter; 

The detectable reporter sequence is formed by incorporation of one or more 
dye-labeled natural or modified nucleotides in the primer extension reaction in 
another aspect of this invention. 

The indicated sorting procedure may comprise use of a fluorescence 
20 activated cell sorter in yet another aspect of this invention. 

An aspect of this invention is that the detectable reporter in the above method 
is a restriction endonuclease cleavage site and the sorting procedure entails 
restriction endonuclease digestion. 

That the polymerase obtained in the above methods be a thermostable 
25 polymerase is another aspect of this invention. 

The polymerase obtained by any of the above methods wherein the modified 
nucleotide being incorporated is selected from the group consisting of: 

a compound having the chemical structure: 
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wherein R 1 is selected from the group consisting of: 



5 




a compound having the chemical structure: 

10 



0 0 0 




wherein said "Base" is selected from the group consisting of cytosine, guanine, 
inosine and uracil, 
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a compound having the chemical structure: 
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wherein said "Base" is selected from the group consisting of adenine, cytosine, 
guanine, inosine, thymine and uracil; and, 

a compound selected from the group consisting of: 
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wherein: 

Base, is selected from the group consisting of adenine, cytosine, guanine or inosine, 
and thymine or uracil; 

Base 2 is selected from the group consisting of the remaining three bases which are 
5 not Base 1; 

R 3 is 0--P(=0)(0- )-0-P(=0)(0-)-0-P(=0)(0 )-0-; and, 
W is an electron withdrawing group; 
X is leaving group; 

a second W or X shown in parentheses on the same carbon atom means that a 
10 single W or X group can be in either position on the sugar or both W or both X 
groups can be present at the same time; and, 
R is a lower alkyl group; 

A final aspect of this invention is a kit, comprising: 

one or more modified nucleotides; 
15 one or more polymerases capable of incorporating said one or more modified 

nucleotides in a polynucleotide to form a modified polynucleotide; and, 

a reagent or reagents capable of cleaving said modified polynucleotide at 
each point of occurrence of said one or more modified nucleotides in said 
polynucleotide. 

20 As used herein, a "chemical method" refers to a combination of one or more 

modified nucleotides and one or more reagents which, when the modified 
nucleotide(s) is incorporated into a polynucleotide by partial or complete substitution 
for a natural nucleotide and the modified polynucleotide is subjected to the 
reagent(s), results in the selective cleavage of the modified polynucleotide at the 

25 point(s) of incorporation of the modified nucleotide(s). 

By "analysis" is meant either detection of variance in the nucleotide sequence 
among two or more related polynucleotides or, in the alternative, the determination 
of the full nucleotide sequence of a polynucleotide. 

By "reagent" is meant a chemical or physical force which causes the cleavage 

30 of a modified polynucleotide at the point of incorporation of a modified nucleotide in 
place of a natural nucleotide; such a reagent may be, without limitation, a chemical 
or combination of chemicals, normal or coherent (laser) visible or uv light, heat, high 
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energy ion bombardment and irradiation. In addition, a reagent may consist of a 
protein such as, without limitation, a polymerase. 

"Related" polynucleotides are polynucleotides obtained from genetically 
similar sources such that the nucleotide sequence of the polynucleotides would be 
5 expected to be exactly the same in the absence of a variance or there would be 
expected to be a region of overlap that, in the absence of a variance would be 
exactly the same, where the region of overlap is greater than 35 nucleotides. 

A "variance" is a difference in the nucleotide sequence among related 
polynucleotides. The difference may be the deletion of one or more nucleotides 
10 from the sequence of one polynucleotide compared to the sequence of a related 
polynucleotide, the addition of one or more nucleotides or the substitution of one 
nucleotide for another. The terms "mutation," "polymorphism" and "variance" are 
used interchangeably herein. As used herein, the term "variance" in the singular is 
to be construed to include multiple variances; i.e., two or more nucleotide additions, 
15 deletions and/or substitutions in the same polynucleotide. A "point mutation" refers 
to a single substitution of one nucleotide for another. 

A "sequence" or "nucleotide sequence" refers to the order of nulceotide 
residues in a nucleic acid. 

As noted above, one aspect of the chemical method of the present invention 
20 consists of modified nucleotides which can be incorporated into an polynucleotide in 
place of natural nucleotides. 

A "nucleoside" refers to a base linked to a sugar. The base may be adenine 
(A), guanine (G) (or its substitute, inosine (I)), cytosine (C), or thymine (T) (or its 
substitute, uracil (U)). The sugar may be ribose (the sugar of a natural nucleotide in 
25 RNA) or 2-deoxyribose (the sugar of a natural nucleotide in DNA). 

A "nucleoside triphosphate" refers to a nucleoside linked to a triphosphate 

group 

(0-P(=0)(0')-0-P(=0)(0 )-0-P(=0)(0 )-0-nucleoside). The triphosphate group has 
four formal negative charges which require counter-ions, i.e., positively charged 
30 ions. Any positively charged ion can be used, e.g., without limitation, Na\ K + , NH 4 \ 
Mg 2 \ Ca 2+ , etc. Na + is one of the most commonly used counter-ions. It is accepted 
convention in the art to omit the counter-ion, which is understood to be present, 
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when displaying nucleoside triphosphates and that convention will be followed in this 
application. 

As used herein, unless expressly noted otherwise, the term "nucleoside 
triphosphate" or reference to any specific nucleoside triphosphate; e.g., adenosine 
5 triphosphate, guanosine triphosphate or cytidihe triphosphate, refers to the 
triphosphate made using either a ribonucleoside or a 2'-deoxyribonucleoside. 

A "nucleotide" refers to a nucleoside linked to a single phosphate group or, by 
convention, when referring incorporation into a polynucleotide, a short-hand for the 
nucleoside triphosphate which is the specie which actually polymerizes in the 
10 presence of a polymerase. 

A "natural nucleotide" refers to an A, C, G or U nucleotide when referring to 
RNA and to dA, dC, dG (the "d" referring to the fact that the sugar is a deoxyribose) 
and dT when referring to DNA. A natural nucleotide also refers to a nucleotide 
which may have a different structure from the above, but which is naturally 
15 incorporated into a polynucleotide sequence by the organism which is the source of 
the polynucleotide. 

As used herein, inosine (I) refers to a purine ribonucleoside containing the 
base hypoxanthine. 

As used herein, a "substitute" for a nucleoside triphosphate refers to a 
20 molecule in a different nucleoside may be naturally substituted for A, C, G or T. 

Thus, inosine is a natural substitute for guanosine and uridine is a natural substitute 
for thymidine. 

As used herein, a "modified nucleotide" is characterized by two criteria. First, 
a modified nucleotide is a "non-natural" nucleotide. In one aspect, a "non-natural" 

25 nucleotide may be a natural nucleotide which is placed in non-natural surroundings. 
For example, in a polynucleotide which is naturally composed of 
deoxyribonucleotides, a ribonucleotide would constitute a "non-natural" nucleotide 
when incorporated into that polynucleotide. Conversely, in a polynucleotide which is 
naturally composed of ribonucleotides, a deoxyribonucleotide incorporated into that 

30 polynucleotide would constitute a non-natural nucleotide. In addition, a "non-natural" 
nucleotide may be a natural nucleotide which has been chemically altered, for 
example, without limitation, by the addition of one or more chemical substituent 
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groups to the nucleotide molecule, the deletion of one or more chemical substituents 
groups from the molecule or the replacement of one or more atoms or chemical 
substituents in the nucleotide for other atoms or chemical substituents. Finally, a 
"modified" nucleotide may be a molecule that resembles a natural nucleotide little, if 
5 at all, but is nevertheless capable of being incorporated by a polymerase into a 
polynucleotide in place of a natural nucleotide. 

The second criterion by which a "modified" nucleotide, as the term is used 
herein, is characterized is that it alter the cleavage properties of the polynucleotide 
into which it is incorporated. For example, without limitation, incorporation of a 

10 ribonucleotide into an polynucleotide composed predominantly of 

deoxyribonuclotides imparts a susceptibility to alkaline cleavage which does not 
exist in natural deoxyribonuclotides. This second criterion of a "modified" nucleotide 
may be met by a single non-natural nucleotide substituted for a single natural 
nucleotide (e.g., the substitution of ribonucleotide for deoxyribonucleotide described 

15 above) or by a combination of two or more non-natural nucleotides which, when 
subjected to selected reaction conditions, do not individually alter the cleavage 
properties of a polynucleotide but, rather, interact with one another to impose altered 
cleavage properties on the polynucleotide (termed "dinucleotide cleavage"). 
When reference is made herein to the incorporation of a single modified 

20 nucleotide into a polynucleotide and the subsequent cleavage of the modified 
polynucleotide, the modified nucleotide cannot be a ribonucleotide. 

"Having different cleavage characteristics" when referring to a modified 
nucleotide means that modified nucleotides incorporated into the same modified 
polynulceotide can be cleaved under reaction conditions which leaves the sites of 

25 incorporation of each of the other modified nucleotides in that modified 
polynucleotide intact. 

As used herein, a "stabilizing modified nucleotide" refers to a modified 
nucleotide that imparts increased resistance to cleavage that the site of 
incorporation of such a modified nucleotide. Most of the modified nucleotides 

30 described herein provide increased lability to cleavage when incorporated in a 
modified polynucleotide. However, the differential lability of modified nucleotides 
over natural nucleotides in a modified polynucleotide may not always be sufficient to 
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allow complete cleavage at the modified nucleotide(s) while avoiding any cleavage 
at the natural nucleotides. Therefore there is a useful role for modified nucleotides 
that reduce lability (stabilizing nucleotides), in that the presence of stabilizing 
nucleotides in a polynucleotide which also contains nucleotides that increase lability 
5 to a particular cleavage procedure (labilizing nucleotides) can provide increased 
discrimination between cleaved and noncleaved nucleotides in a cleavage 
procedure. The preferred way to use stabilizing nucleotides in a polynucleotide is to 
substitute stabilizing nucleotides for all the nucleotides that are not labilizing 
nucleotides. In the case of mononucleotide cleavage this would entail use of three 

10 stabilizing nucleotides and one labilizing nucleotide; in the case of dinucleotide 
cleavage this would entail use of two stabilizing nucleotides and two (different) 
labilizing nucleotides. As used herein the term "stabilizing nucleotide" refers to a 
modified nucleotide which, when incorporated in a polynucleotide and subjected to a 
cleavage procedure, reduces cleavage at the stabilizing nucleotides relative to mono 

15 or dinucleotide cleavage at other (nonstabilizing) nucleotides of the polynucleotide, 
whether said other nucleotides are natural nucleotides or labilizing nucleotides. 

A use here a "destabilizing modified nucleotide" or a "labilizing modified 
nucleotide refers to a modified nucleotide which imparts greater affinity for cleavage 
than a natural nucleotide at sites of incorporation of the destabilizing modified 

20 nucleotide in a polynucleotide. 

As used herein "determining a mass" refers to the use of a mass 
spectrometer to determine the mass of a molecule. Mass spectrometers generally 
measure the mass to charge ratio (m/z) of analyte ions, from which the mass can be 
inferred. When the charge state of the analyte polynucleotide is +1 or -1 the m/z 

25 ratio and the mass are numerically the same after making a correction for the proton 
mass (an extra proton is added to positively charged ions and a proton is abstracted 
from negatively charged ions) but when the charge is >+1 or <-1 the m/z ratio will 
usually be less than the actual mass. In some cases the software provided with a 
mass spectrometer computes the mass from m/z so the user does not need to be 

30 aware of the difference. 

As used herein, a "label" or "tag" refers to a molecule that, when appended 
by, for example, without limitation, covalent bonding or hybridization, to another 
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molecule, for example, also without limitation, a polynucleotide or polynucleotide 
fragment, provides or enhances a means of detecting the other molecule. A 
fluorescence or fluorescent label or tag emits detectable light at a particular 
wavelength when excited at a different wavelength. A radiolabel or radioactive tag 
5 emits radioactive particles detectable with an instrument such as, without limitation, 
a scintillation counter. 

A "mass-modified" nucleotide is a nucleotide in which an atom or chemical 
substituents has been added, deleted or substituted but such addition, deletion or 
substitution does not create modified nucleotide properties, as defined herein, in the 

10 nucleotide; i.e., the only effect of the addition, deletion or substitution is to modify the 
mass of the nucleotide. 

A "polynucleotide" refers to a linear chain of nucleotides connected by a 
phosphodiester linkage between the 3'-hydroxyl group of one nucleoside and the 5'- 
hydroxyl group of a second nucleoside which in turn is linked through its 3'-hydroxyl 

15 group to the 5'-hydroxyl group of a third nucleoside and so on to form a polymer 
comprised of nucleosides liked by a phosphodiester backbone. The polynucleotide 
may be, without limitation, single or double stranded DNA or RNA or any other 
structure known in the art. 

A "modified polynucleotide" refers to a polynucleotide in which one or more 

20 natural nucleotides have been partially or substantially completely replaced with 
modified nucleotides. 

A "modified DNA fragment" refers to a DNA fragment synthesized under 
Sanger dideoxy termination conditions with one of the natural nucleotides other than 
the one which is partially substituted with its dideoxy analog being replaced with a 

25 modified nucleotide as defined herein. The result is a set of Sanger fragments; i.e., 
a set of fragments ending in ddA, ddC, ddG or ddT, depending on the dideoxy 
nucleotide used with each such fragment also containing modified nucleotides (if, of 
course, the natural nucleotide corresponding to the modified nucleotide exists in that 
particular Sanger fragment). 

30 As used herein, to "alter the cleavage properties" of a polynucleotide means 

to render the polynucleotide differentially cleavable or non-cleavable; i.e., resistant to 
cleavage, at the point of incorporation of the modified nucleotide relative to sites 
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consisting of other non-natural or natural nucleotides. It is presently preferred to 
"alter the cleavage properties" by rendering the polynucleotide more susceptible to 
cleavage at the sites of incorporation of modified nucleotides than at any other sites 
in the molecule. 

5 As used herein, the use of the singular when referring to nucleotide 

substitution is to be construed as including substitution at each point of occurrence 
of the natural nucleotide unless expressly noted to be otherwise. 

As used herein, a "template" refers to a target polynucleotide strand, for 
example, without limitation, an unmodified naturally-occurring DNA strand, which a 

10 polymerase uses as a means of recognizing which nucleotide it should next 

incorporate into a growing strand to polymerize the complement of the naturally- 
occurring strand. Such DNA strand may be single-stranded or it may be part of a 
double-stranded DNA template. In applications of the present invention requiring 
repeated cycles of polymerization, e.g., the polymerase chain reaction (PCR), the 

15 template strand itself may become modified by incorporation of modified nucleotides, 
yet still serve as a template for a polymerase to synthesize additional 
polynucleotides. 

A "primer" is a short oligonucleotide, the sequence of which is complementary 
to a segment of the template which is being replicated, and which the polymerase 
20 uses as the starting point for the replication process. By "complementary" is meant 
that the nucleotide sequence of a primer is such that the primer can form a stable 
hydrogen bond complex with the template; i.e., the primer can hybridize to the 
template by virtue of the formation of base-pairs over a length of at least ten base 
pairs. 

25 As used herein, a "polymerase" refers, without limitation, to molecules such 

as DNA or RNA polymerases, reverse transcriptases, mutant DNA or RNA 
polymerases mutagenized by nucleotide addition, nucleotide deletion, one or more 
point mutations or the technique known to those skilled in the art as "DNA shuffling" 
(q.v., infra ) or by joining portions of different polymerases to make chimeric 

30 polymerases. Combinations of these mutagenizing techniques may also be used. A 
polymerase catalyzes the polymerization of nucleotides to form polynucleotides. 
Methods are disclosed herein and are an aspect of this invention, for producing, 
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identifying and using polymerases capable of efficiently incorporating modified 
nucleotides along with natural nucleotides into a polynucleotide. Polymerases may 
be used either to extend a primer once or repetitively or to amplify a polynucleotide 
by repetitive priming of two complementary strands using two primers. Methods of 
5 amplification include, without limitation, polymerase chain reaction (PCR), NASBR, 
SDA, 3SR, TSA and rolling circle replication. It is understood that, in any method for 
producing a polynucleotide containing given modified nucleotides, one or several 
polymerases or amplification methods may be used. A "heat stable polymerase" or 
"thermostable polymerase" refers to a polymerase which retains sufficient activity to 

10 effect primer extension reactions after being subjected to elevated temperatures, 
such as those necessary to denature double-stranded nucleic acids. 

The selection of optimal polymerization conditions depends on the 
application. In general, a form of primer extension may be best suited to sequencing 
or variance detection methods that rely on dinucleotide cleavage and mass 

15 spectrometric analysis while either primer extension or amplification (e.g., PCR) will 
be suitable for sequencing methods that rely on electrophoretic analysis. 
Genotyping methods are best suited to production of polynucleotides by 
amplification. Either type of polymerization may be suitable for variance detection 
methods of this invention. 

20 A "restriction enzyme" refers to an endonuclease (an enzyme that cleaves 

phosphodiester bonds within a polynucleotide chain) that cleaves DNA in response 
to a recognition site on the DNA. The recognition site (restriction site) consists of a 
specific sequence of nucleotides typically about 4-8 nucleotides long. 

As used herein, "electrophoresis" refers to that technique known in the art as 

25 gel electrophoresis; e.g., slab gel electrophoresis, capillary electrophoresis and 

automated versions of these, such as the use of an automated DNA sequencer or a 
simultaneous multi-channel automated capillary DNA sequencer or electrophoresis 
in an etched channel such as that which can be produced in glass or other 
materials. 

30 "Mass spectrometry" refers to a technique for mass analysis known in the art 

which includes, but is not limited to, matrix assisted laser desorbtion ionization 
(MALDI) and electrospray ionization (ESI) mass spectrometry optionally employing, 
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without limitation, time-of-flight, quadrupole or Fourier transform detection 
techniques. While the use of mass spectrometry constitutes a preferred 
embodiment of this invention, it will be apparent that other instrumental techniques 
are, or may become, available for the determination of the mass or the comparison 
5 of masses of oligonucleotides. An aspect of the present invention is the 

determination and comparison of masses and any such instrumental procedure 
capable of such determination and comparison is deemed to be within the scope 
and spirit of this invention. 

As used herein, "FRET" refers to fluorescence resonance energy transfer, a 

10 distance dependent interaction between the electronic excited states of two dye 
molecules in which excitation is transferred from one dye (the donor) to another dye 
(the acceptor) without emission of a photon. A series of fluorogenic procedures 
have been developed to exploit FRET. In the present invention, the two dye 
molecules are generally located on opposite sides of a cleavable modified nucleotide 

1 5 such that cleavage will alter the proximity of the dyes to one another and thereby 
change the fluorescense output of the dyes on the polynucleotide. 

As used herein "construct a gene sequence" refers to the process of inferring 
partial or complete information about the DNA sequence of a subject polynucleotide 
by analysis of the masses of its fragments obtained by a cleavage procedure. The 

20 process of constructing a gene sequence generally entails comparison of a set of 
experimentally determined cleavage masses with the known or predicted masses of 
all possible polynucleotides that could be obtained from the subject polynucleotide 
given only the constraints of the modified nucleotide(s) incorporated in the 
polynucleotide and the chemical reaction mechanism(s) utilized, both of which 

25 impact the range of possible constituent masses. Various analytical deductions may 
then be employed to extract the greatest amount of sequence information from the 
masses of the cleavage fragments. More sequence information can generally be 
inferred when the subject polynucleotide is modified and cleaved, in separate 
reactions, by two or more modified nucleotides or sets of modified nucleotides 

30 because the range of deductions that may be made from analysis of several sets of 
cleavage fragments is greater. 
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As used herein, a "sequence ladder" is a collection of overlapping 
polynucleotides, prepared from a single DNA or RNA template, which share a 
common end, usually the 5' end, but which differ in length because they terminate at 
different sites at the opposite end. The sites of termination coincide with the sites of 
5 occurrence of one of the four nucleotides, A,G,C or T/U, in the template. Thus the 
lengths of the polynucleotides collectively specify the intervals at which one of the 
four nucleotides occurs in the template DNA fragment. A set of four such sequence 
ladders, one specific for each of the four nucleotides, specifies the intervals at which 
all four nucleotides occur, and therefore provides the complete sequence of the 

10 template DNA fragment. As used herein, the term "sequence ladder" also refers to 
the set of four sequence ladders required to determine a complete DNA sequence. 
The process of obtaining the four sequence ladders to determine a complete DNA 
sequence is referred to as "generating a sequence ladder." 

As used herein, "cell senscence selection" refers to a process by which cells 

15 that are susceptible to being killed by a particular chemical only when the cells are 
actively growing; e.g., without limitation, bacteria which can be killed by antibiotics 
only when they are growing, are used to find a polymerase which will incorporate a 
modified nucleotide into a polynulceotide. The procedure requires that, when a 
particular polymerase which has been introduced into the cell line incorporates a 

20 modified nulceotide, that incorporation produces changes in the cells which cause 
them to senesce, i.e., to stop growing. When cell colonies, some members of which 
contain the modified nucleotide-incorporating polymerase and some member of 
which don't, are then exposed to the chemical, only those cells which do not contain 
the polymerase are killed. The cells are then placed in a medium where cell growth 

25 is reinitiated; i.e., a medium without the chemical or the modified nucleotide, and 
those cells which grow are separated and the polymerase isolated from them. 

As used herein, a "chemical oxidant" refers to a reagent capable of increasing 
the oxidation state of a group on a molecule. For instance, without limitation, a 
hydroxyl group (-OH) can be oxidized to a keto group. For example and without 

30 limitation, potassium permanganate, t-butyl hypochlorite, m-chloroperbenzoic acid, 
hydrogen peroxide, sodium hypochlorite, ozone, peracetic acid, potassium 
persulfate, and sodium hypobromite are chemical oxidants. 
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As used herein, a "chemical base" refers to a chemical which, in aqueous 
medium, has a pK greater than 7.0. Examples of chemical bases are, without 
limitation, alkali (sodium, potassium, lithium) and alkaline earth (calcium, 
magnesium, barium) hydroxides, sodium carbonate, sodium bicarbonate, trisodium 
5 phosphate, ammonium hydroxide and nitrogen-containing organic compounds such 
as pyridine, aniline, quinoline, morpholine, piperidine and pyrrole. These may be 
used as aqueous solutions which may be mild (usually due to dilution) or strong 
(concentrated solutions). A chemical base also refers to a strong non-aqueous 
organic base; examples of such bases include, without limitation, sodium 

10 methoxide, sodium ethoxide and potassium t-butoxide. 

As used herein, the term "acid" refers to a substance which dissociates on 
solution in water to produce one or more hydrogen ions. The acid may be inorganic 
or organic. The acid may be strong which generally infers highly concentrated, or 
mild which generally infers dilute. It is, of course, understood that acids inherently 

15 have different strengths; e.g., sulfuric acid is much stronger than acetic acid and this 
factor may also be taken into consideration when selecting the appropriate acid to 
use in conjunction with the methods described herein. The proper choice of acid 
will be apparent to those skilled in the art from the disclosures herein. Preferably, 
the acids used in the methods of this invention are mild. Examples of inorganic 

20 acids are, without limitation, hydrochloric acid, sulfuric acid, phosphoric acid, nitric 
acid and boric acid. Examples, without limitation, of organic acids are formic acid, 
acetic acid, benzoic acid, p-toluenesulfonic acid, trifluoracetic acid, naphthoic acid, 
uric acid and phenol. 

An "electron-withdrawing group" refers to a chemical group which, by virtue of 

25 its greater electronegativity inductively draws electron density away from nearby 
groups and toward itself, leaving the less electronegative group with a partial 
positive charge. This partial positive charge, in turn, can stabilize a negative charge 
on an adjacent group thus facilitating any reaction which involves a negative charge, 
either formal or in a transition state, on the adjacent group. Examples of electron- 

30 withdrawing groups include, without limitation, cyano (C=N) , azido (-N=N), nitro 
(N0 2 ), halo (F, CI, Br, I), hydroxy (-OH), thiohydroxy (-SH) and ammonium (-NH 3 + ). 
An "electron withdrawing element," as used herein, refers to an atom which is 
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more electronegative than carbon so that, when placed in a ring, the atom draws 
electrons to it which, as with an electron-withdrawing group, results in nearby atoms 
being left with a partial positive charge. This renders the nearby atoms susceptible 
to nucleophilic attack. It also tends to stabilize, and therefore favor the formation of, 
5 negative charges on other atoms attached to the positively charged atom. 

An "electrophile" or "electrophilic group" refers to a group which, when it 
reacts with a molecule, takes a pair of electrons from the molecule. Examples of 
some common electrophiles are, without limitation, iodine and aromatic nitrogen 
cations. 

10 An "alkyl" group as used herein refers to a 1 to 20 carbon atom straight or 

branched, unsubstituted group. Preferably the group consists of a 1 to 10 carbon 
atom chain; most preferably, it is a 1 to 4 carbon atom chain. As used herein "1 to 
20," etc. carbon atoms means 1 or 2 or 3 or 4, etc. up to 20 carbon atoms in the 
chain. 

15 A "mercapto" group refers to an -SH group. 

An "alkylating agent" refers to a moiecule which is capable of introducing an 
alkyl group into a molecule. Examples, without limitation, of alkyl groups include 
methyl iodide, dimethyl sulfate, diethyl sulfate, ethyl bromide and butyl iodide. 
As used herein, the terms "selective," "selectively," "substantially," 

20 "essentially," "uniformly" and the like, mean that the indicated event occurs to a 

particular degree. In particular, the percent incorporation of a modified nucleotide is 
greater than 90%, preferably greater than 95%, most preferably, greater than 99% 
or the selectivity for cleavage at a modified nucleotide is greater than 10X, preferably 
greater than 25X, most preferably greater than 1 00X that of other nucleotides 

25 natural or modified, or the percent cleavage at a modified nucleotide is greater than 
90%, preferably greater than 95%, most preferably greater than 99%. 

As use herein, "diagnosis refers to determining the nature of a disease or 
disorder. The methods of this invention may be used in any form of diagnosis 
including, without limitation, clinical diagnosis (a diagnosis made from a study of the 

30 signs and symptoms of a disease or disorder, where such sign or symptom is the 
presence of a variance), differential diagnosis (the determination of which of two or 
more diseases with similar symptoms is the one from which a patient is suffering), etc. 
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By "prognosis," as used herein, is meant a forecast of the of the probably 
course and/or outcome of a disease. In the context of this invention, the methods 
described herein may be used to follow the effect of a genetic variance or variances 
on disease progression or treatment response. It is to be noted that, using the 
5 methods of this invention as a prognostic tool does not require knowledge of the 
biological impact of a variance. The detection of a variance in an individual afflicted 
with a particular disorder or the statistical association of the variance with the 
disorder is sufficient. The progression or response to treatment of patients with a 
particular variance can then be traced throughout the course of the disorder to guide 
1 0 therapy or other disorder management decisions. 

By "having a genetic component" is meant that a particular disease, disorder 
or response to treatment is known or suspected to be related to a variance or 
variances in the genetic code of an individual afflicted with the disease or disorder. 
As used herein, an "individual" refers to any higher life form including reptiles 
15 and mammals, in particular human beings. However, the methods of this invention 
are useful for the analysis of the nucleic acids of any biological organism 

BRIEF DESCRIPTION OF THE TABLES 
Table 1 is a description of several procedures presently in use for the 
detection of variance in DNA. 
20 Table 2 shows the molecular weights of the four DNA nucleotide 

monophosphates and the mass difference between each pair of nucleotides. 

Table 3 shows the masses of all possible 2mers, 3mers, 4mers and 5mers of 
the DNA nucleotides in Table 2. 

Table 4 shows the masses of all possible 2mers, 3mers, 4mers, 5mers, 
25 6mers and 7mers that would be produced by cleavage at one of the four nucleotides 
and the mass differences between neighboring oligonucleotides. 

Table 5 shows the mass changes that will occur for all possible point 
mutations (replacement of one nucleotide by another) and the theoretical maximum 
size of a polynucleotide in which a point mutation should be detectable by mass 
30 spectrometry using mass spectrometers of varying resolving powers. 
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Table 6 shows the actual molecular weight differences observed in an 
oligonucleotide using the method of this invention; the difference reveals a hitherto 
unknown variance in the oligonucleotide. 

Table 7 shows all of the masses obtained by cleavage of an exemplary 
5 20mer in four separate reactions, each reaction being specific for one of the DNA 
nucleotide; i.e., at A, C, G and T. 

BRIEF DESCRIPTION OF THE FIGURES 
Figure 1 shows detection of a single base change (a T to C) in 66 base-pair 
fragments obtained by PCR. 
10 Figure 2 shows the molecular weights of the main fragments expected from 

cleavage of a polynucleotide modified by incorporation of the modified nucleotide 7- 
methylguanine in place of G. 

Figure 3 shows polyacrylamide gel analysis of polynucleotides with modified 
G before and after cleavage. Two polynucleotides differing by a single nucleotide 
1 5 (RFC vs. RFC mut) were analyzed. 

Figure 4 is a mass spectrogram, with magnified insert, of the 66 base-pair 
fragment PCR amplified in the presence of RFC. 

Figure 5 shows the mass spectrogram, with magnified insert, of the cleavage 
products from a 66 base polynucleotide with complete substitution of 7-methylG for 
20 G and subsequent cleavage at G. 

Figure 6 is a mass spectrogram of two oligonucleotides differing by only one 
nucleotide; i.e., a G is present only in the larger oligonucleotide. 

Figure 7 shows a sequencing gel of a linearized, single-stranded M13 
template. The template was extended to 87 nucleotides in the present of 5'-amino 
25 dTTP using exo-minus Klenow polymerase and then partially cleaved with acetic 
acid 

Figure 8 shows a purified full-length extension product of the fragment in 
Figure 7 before and after chemical cleavage. 

Figure 9 shows results of a restriction endonuclease digestion of the fully 
30 extended primer/template complex of Figures 7 and 8 and also shows extension of 
the primer in the presence of 5'-aminoT to form a 7.2 Kb polynucleotide. 
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Figure 10 shows the resolution obtained upon high performance liquid 
chromatography (HPLC) separation of an Hae III restricted PhiX174 DNA. 

Figure 1 1 shows the sequence ladder obtained from a polynucleotide in 
which T was replaced with 5-amino T, followed by cleavage with acetic acid and 
5 denaturing polyacrylamide gel electrophoresis. 

Figure 12 shows an example of dinucleotide cleavage in which a 
ribonucleotide is 5' of a bridging thiol ester. 

Figure 13 shows the efficiency of complete mononucleotide cleavage or 
complete dinucleotide cleavage for variance detection in 50, 100, 150, 200 and 250 
10 nucleotide polynucleotides. 

Figures 14 through 18 show various aspects of long range DNA sequencing 
using chemically cleavable modified nucleotides. 

Figure 14 shows a hypothetical shotgun sequencing analysis of a 10 kb clone 
and illustrates the principle and advantages of long range DNA sequencing by 
1 5 chemical cleavage of polymerase incorporated mononucleotides. 

Figure 15 illustrates the sequencing of a 2.7 kb plasmid by primer extension 
in the presence of 4 dNTPs and one 5-amino-dNTP followed by restriction 
endonuclease digestion, end labeling, chemical cleavage and electrophoretic 
resolution of the resulting sequence ladder. 
20 Figure 16 shows the separation of partially 5'-aminoT substituted Hindi 

restriction endonuclease fragments by HPLC. 

Figure 17 is a comparison of sequence ladders produced by dideoxy 
termination and by acid cleavage of partially 5'-amino nucleotide substituted primer 
extension products. The chemical cleavage procedure results in a homogeneous 
25 distribution of labeled products over greater than 4000 nucleotides. 

Figure 18 is a comparison of sequence ladders produced by dideoxy 
termination and by acid cleavage of partially 5'-amino-nucleotide substituted primer 
extension products as visualized on an autoradiogram. 

Figure 19 is an illustration of the DNA fragments produced by restriction 
30 endonuclease cleavage of a 700 nt DNA fragment compared to fragments produced 
by dinucleotide chemical cleavage. 
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Figure 20 shows a dinucleotide cleavage employing a ribonucleotide and a 
5'-amino-nucleotide in a 5' to 3' orientation. 

Figure 21 compares the cleavage products obtained by base cleavage of a 
ribonucleotide and 5'-aminonucleotide substituted DNA fragment with the cleavage 
5 products obtained by acid cleavage. 

Figure 22 shows the results of cleavage of a DNA fragment substituted with 
ribo-G and 5'amino-TTP. The autoradiogram shows complete cleavage at GT and 
no background cleavage at G orT. 

Figure 23 shows the results of cleavage of a DNA fragment incorporating 
10 ribo-A and 5'-amino-TTP. Again, the autoradiogram shows complete and completely 
site specific cleavage. 

Figure 24 is a mass spectrogram of the cleavage products of the DNA 
fragment of Fig. 23. All fragments except the 2 nt fragment are observed. 

Figure 25 depicts the results of dinucleotide cleavage of a 257 nt primer 
15 extension product into which ribo-A and 5'amino-UP have been incorporated. 
Figure 26 is a MALDI-TOF mass spectrogram of the AT dinucleotide 
cleavage products of the primer extension product of Fig. 25. 
Figures 27 - 33 demonstrate the application of mononucleotide cleavage to 
genotyping by mass spectrometry, capillary electrophoresis and FRET. 
20 Figure 27 is a schematic illustration of genotyping (variance detection at a 

known variant site). 

Figure 28 shows the results of genotyping a dA vs dG variance in the 
transferrin receptor by PCR amplification in the presence of modified ddA followed 
by chemical cleavage at the modified nucleotide. 
25 Figure 29 exemplified genotyping using modified nucleotide 

incorporation/chemical cleavage followed by mass spectrometric analysis of the 
resulting fragments. 

Figure 30 demonstrates genotyping of a modified nucleotide containing 
transferrin receptor by chemical cleavage followed by MALDI-TOF. 
30 Figure 31 demonstrates distinguished features of MALDI-TOF genotyping. 
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Figure 32 demonstrates genotyping of a transferrin receptor polymorphism 
by chemical cleavage of a modified nucleotide transferrin receptor followed by slab 
gel or capillary electrophoresis. 

Figure 33 illustrates schematically FRET detection of variant polynucleotides 
5 after chemical cleavage of a modified polynucleotide. 

DETAILED DESCRIPTION OF THE INVENTION 
In one aspect, this invention relates to a method for detecting a variance in 
the nucleotide sequence among related polynucleotides by replacing a natural 
nucleotide in a polynucleotide at substantially each point of incorporation of the 
10 natural nucleotide with a modified nucleotide, cleaving the modified polynucleotide at 
substantially each point of incorporation of the modified nucleotide, determining the 
mass of the fragments obtained and then comparing the masses with those 
expected from a related polynucleotide of known sequence or, if the sequence of a 
related polynucleotide is unknown, by repeating the above steps with a second 
1 5 related polynucleotide and then comparing the masses of the fragments obtained 
from the two related polynucleotides. Of course, it is understood that the methods of 
this invention are not limited to any particular number of related polynucleotides; as 
many as are needed or desired may be used. 

In another aspect, this invention relates to a method for detecting a variance 
20 in the nucleotide sequence among related polynucleotides by replacing two natural 
nucleotides in a polynucleotide with two modified nucleotides, the modified 
nucleotides being selected so that, under the chosen reaction condition, they 
individually not impart selective cleavage properties on the modified polynucleotide. 
Rather, when the two modified nucleotides are contiguous; i.e., the natural 
25 nucleotides being replaced were contiguous in the unmodified polynucleotide, they 
act in concert to impart selective cleavage properties on the modified polynucleotide. 
In addition to mere proximity, it may also be necessary, depending on the modified 
nucleotides and reaction conditions selected, that the modified nucleotides are in the 
proper spatial relationship. For example, without limitation, 5'A-3'G might be 
30 susceptible to cleavage while 5'G-3'A might not. As above, once substitution of the 
modified nucleotides for the natural nucleotides has been accomplished, the 
modified nucleotide pair is cleaved, the masses of the fragments are determined and 
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the masses are compared, either to the masses expected from a related 
polynucleotide of known sequence or, if the sequence of at least one of the related 
polynucleotides is not known, to the masses obtained when the procedure is 
repeated with other related polynucleotides. 
5 In another aspect, this invention relates to methods for detecting mono- or 

dinucleotide cleavage products by electrophoresis or fluorescence resonance 
energy transfer (FRET). In FRET-based assays, the presence or absence of 
fluorescence over a specified wavelength range is monitored. Both these methods 
are particularly well-suited for detecting variance at a single site in a polynucleotide 

10 where the variance has been previously identified. Knowledge of the particular 
variance permits the design of electrophoretic or FRET reagents and procedures 
specifically suited to the rapid, low cost, automatable determination of the status of 
the variant nucleotide(s). Examples of electrophoretic and FRET detection of 
cleavage products are described below and in the Figures. 

15 The use of the variance detection methods of this invention for the 

development of and use as diagnostic or prognostic tools for the detection of 
predisposition to certain diseases and disorders is another aspect of this invention. 

In the development of diagnostic tools, the methods of this invention would be 
employed to compare the DNA of a test subject which is displaying symptoms of a 

20 particular disease or disorder known or suspected to be genetically-related or is 
displaying a desirable characteristic such as a health enhancing or economically 
valuable trait such as growth rate, pest resistance, crop yield, etc. with the DNA of 
healthy members of the same population and/or members of the population which 
exhibit the same disease, disorder or trait. The test subject may be, without 

25 limitation, a human, any other mammal such as rat, mouse, dog, cat, horse, cow, 
pig, sheep, goat, etc., cold-blooded species such as fish or agriculturally important 
crops such as wheat, corn, cotton and soy beans. The detection of a statistically 
significant variance between the healthy members of the population and members of 
the population with the disease or disorder would serve as substantial evidence of 

30 the utility of the test for identifying subjects having or at risk of having the disease or 
disorder. This could lead to very useful diagnostic tests. 
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Using the methods of this invention as a diagnostic or prognostic tool, it is 
entirely unnecessary to know anything about the variance being sought; i.e., its 
exact location, whether it is an addition, deletion or substitution or what nucleotide(s) 
have been added, deleted or substituted. The mere detection of the presence of the 
5 variance accomplishes the desired task, to diagnose or predict the incidence of a 
disease or disorder in a test subject. In most instances, however, it would be 
preferable to be able to create a specific genotyping test for a particular variance 
with diagnostic or prognostic utility. 

Particularly useful aspects of the genotyping methods described herein are 

1 0 ease of assay design, low cost of reagents and suitability of the cleavage products 
for detection by a variety of methods including, without limitation, electrophoresis, 
mass spectrometry and fluorescent detection. 

In another aspect of this invention, the complete sequence of a 
polynucleotide may be determined by repeating the above method involving the 

15 replacement of one natural nucleotide at each point of occurrence of the natural 
nucleotide with one modified nucleotide followed by cleavage and mass detection. 
In this embodiment, the procedure is carried out four times with each of the natural 
nucleotides; i.e., in the case of DNA, for example but without limitation, each of dA, 
dC, dG and T is replaced with a modified nucleotide in four separate experiments. 

20 The masses obtained from the four cleavage reactions can then be used to 

determine the complete sequence of the polynucleotide. This method is applicable 
to polynucleotides prepared by primer extension or amplification by, for example, 
PCR; in the latter case both strands undergo modified nucleotide replacement. 

An additional experiment may be necessary should the preceding procedure 

25 leave any nucleotide positions in the sequence ambiguous (see, e.g., the Examples 
section, infra ). This additional experiment may be repeating the above procedure 
using the complementary strand of the DNA being studied if the method involves 
primer extension. The additional experiment may aiso be the use of the above 
described method for replacing two natural nucleotides with two modified 

30 nucleotides, cleaving where the modified nucleotides are contiguous and then 
determining masses of the fragments obtained. Knowledge of the position of 
contiguous nucleotides in the target polynucleotide may resolve the ambiguity. 
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Another experiment which might be employed to resolve any ambiguity which might 
occur in the main experiment is one-pass Sanger sequencing followed by gel 
electrophoresis which is fast and easy but which alone would not afford highly 
accurate sequencing. Thus, in conjunction with the methods of this invention, an 
5 alternative sequencing method known in the art might, in the case of a specific 
ambiguity, provide the information necessary to resolve the ambiguity. 
Combinations of these procedures might also be used. The value of using different 
procedures lies in the generally recognized observation that each sequencing 
method has certain associated artifacts that compromise its performance but the 

10 artifacts are different for different procedures. Thus, when the goal is highly 

accurate sequencing, using two or more sequencing techniques which would tend to 
cancel out each other's artifacts should have great utility. Other additional 
experiments which might resolve an ambiguity will, based on the disclosures herein 
and the specific sequence ambiguity at issue, be apparent to those skilled in the art 

15 and are, therefore, deemed to be within the scope of this invention. 

In yet another aspect of this invention, the modified nucleotide cleavage 
reactions described herein may result in the formation of a covalent bond between 
one of the cleavage fragments and another molecule. This molecule may serve a 
number or purposes. It may contain a directly detectable label or a moiety that 

20 enhances detection of the cleavage products during mass spectrometric, 

electrophoretic or fluorogenic analysis. For example, without limitation, the moiety 
may be a dye, a radioisotope, an ion trap to enhance ionization efficiency, an 
excitable group which can to desorbtion efficiency or simply a large molecule which 
globally alter desorbtion and/or ionization characteristics. The labeling reaction may 

25 be partial or complete. An example of the use of homogeneously labeled DNA 
fragments of controllable size is in DNA hybridization such as hybridization probes 
for DNA on high density arrays like DNA chips. 

An additional aspect of this invention is the replacement of a natural 
nucleotide with a modified nucleotide at only a percentage of the point of occurrence 

30 of that natural nucleotide in a polynucleotide. This percentage may be from about 
0.01% to about 95%, preferably it is from about 0.01% to about 50%, more 
preferably from about 0.01% to about 10% and most preferably from about 0.01% to 
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about 1%. The percent replacement is selected to be complementary to the 
efficiency of the cleavage reaction selected. That is, if a cleavage reaction of low 
efficiency is selected, then a higher percentage of substitution is permissible; if a 
cleavage reaction of high efficiency is selected, then a low percentage of 
5 replacement is preferred. The result desired is that, on the average, each individual 
strand of polynucleotide is cleaved once so that a sequencing ladder, such as that 
described for the Maxam-Gilbert and Sanger procedures, can be developed. Since 
the cleavage reactions described herein are of relatively high efficiency, low 
percentages of replacement are preferred to achieve the desired single cleavage per 

10 polynucleotide strand. Low percentages of replacement may also be more readily 
achieved with available polymerases. However, based on the disclosures herein, 
other cleavage reactions of varying degrees of efficiency will be apparent to those 
skilled in the art and, as such, are within the scope of this invention. It is, in fact, an 
aspect of this invention that, using cleavage reactions of sufficiently low efficiency, 

15 which, in terms of percentage cleavage at points of incorporation of a modified 

nucleotide in a modified polynucleotide may be from about 0.01% to 50%, preferably 
from about 0.01% to 10% and, most preferably, from about 0.01% to about 1%, a 
polynucleotide in which a natural nucleotide has been replaced with a modified 
nucleotide at substantially each point of occurrence may still be used to generate the 

20 sequencing ladder. At the most preferred level of efficiency, about 0.01 % to about 
1%, each strand of a fully modified polynucleotide should, on the average, only be 
cleaved once. 

In another aspect, this invention relates to methods for producing and 
identifying polymerases with novel properties with respect to incorporation and 
25 cleavage of modified nucleotides. 

A. Nucleotide Modification and Cleavage 

(1) Base modification and cleavage 

A modified nucleotide may contain a modified base, a modified sugar, a 
modified phosphate ester linkage or a combination of these. 
30 Base-modification is the chemical modification of the adenine, cytosine, 

guanine or thymine (or, in the case of RNA, uracil) moiety of a nucleotide such that 
the resulting chemical structure renders the modified nucleotide more susceptible to 
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attack by a reagent than a nucleotide containing the unmodified base. The following 
are examples, without limitation of base modification. Other such modification of 
bases will become readily apparent to those skilled in the art in light of the 
disclosures herein and therefore are to be considered to be within the scope of this 
5 invention (e.g., the use of difluorotoluene; Liu, D., at al., Chem. Biol .. 4:919-929, 
1997; Moran, S., etal.. Proc. Natl. Acad. Sci. USA . 94:10506-10511, 1997). 

Some examples, without limitation, of such modified bases are described 

below. 

1 . Adenine (1) can be replaced with 7-deaza-7-nitroadenine (2). The 7- 
10 deaza-7-nitroadenine is readily incorporated into polynucleotides by enzyme- 
catalyzed polymerization. The 7-nitro group activates C-8 to attack by chemical base 
such as, without limitation, aqueous sodium hydroxide or aqueous piperidine, which 
eventually results in specific strand scission. Verdine, et al., JACS, 1996, 1 18:61 16- 
6120; 




15 

1 2 



We have found that cleavage with piperidine is not always complete whereas 
complete cleavage is the desired result. However, when the cleavage reaction is 

20 carried out in the presence of a phosphine derivative, for example, without limitation, 
tris(2-carboxyethyl) phosphine (TCEP) and a base, complete cleavage is obtained. 
An example of such a cleavage reaction is as follows: DNA modified by 
incorporation of 7-nitro-7-deaza-2'-deoxyadenosine is treated with 0.2 M TCEP/1 M 
piperidine/ 0.5 M Tris base at 95° C for one hour. Denaturing polyacrylamide gel 

25 (20%) analysis showed complete cleavage. Other bases such as, without limitation, 
NH 4 OH can be used in place of the piperidine and Tris base. This procedure, i.e., 
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the use of a phosphine in conjunction with a base, should be applicable to any 
cleavage reaction in which the target polynucleotide has been substituted with a 
modified nucleotide which is labile to piperidine. 

The product of cleavage with TCEP and base is unique. Mass spectrometry 
5 analysis was consistent with a structure having a phosphate-ribose-TCEP adduct at 
3* ends and a phosphate moiety at 5' ends, i.e. structure 3. 
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10 How TCEP participates in the fragmentation of a modified polynucleotide is 

not presently known; however, without being held to any particular theory, we 
believe that the mechanism may be the following: 
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The incorporation of the TCEP (or other phosphine) into the cleavage product 
should be a very useful method for labeling fragmented polynucleotides at the same 
time cleavage is being performed. By using an appropriately functionalized 
phosphine that remains capable of forming an adduct at the 3' end ribose as 
5 described above, such functionalities, without limitation, as mass tags, fluorescence 
tags, radioactive tags and ion-trap tags could be incorporated into a fragmented 
polynucleotide. Phosphines that contain one or more tags and that are capable of 
covalently bonding to a cleavage fragment constitute another aspect of this 
invention. Likewise, the use of such tagged phosphines as a method for labeling 

10 polynucleotide fragments is another aspect of this invention. 

While other phosphines, which may become apparent to those skilled in the 
art based on the disclosures herein, may be used to prepare labeled phosphines for 
incorporation onto nucleotide fragments, TCEP is a particularly good candidate for 
labeling. For instance, the carboxy (-C(O)OH) groups may be modified directly by 

15 numerous techniques, for example, without limitation, reaction with an amine, 
alcohol or mercaptan in the presence of a carbodiimide to form an amide, ester or 
mercaptoester as shown in the following reaction scheme: 



Alcohol, amine or triol (R 1 M'H) Alcohol, amine or thiol 

HOOC Dcydohexylcarbodiimide HOOC (tfwfH) 2^2.^ 

^ /-COOH (DCC) ^ r-COtfR' DOC R ^ 1r1 

r ^ P ^ 9 J 

HOOC HOOff HOOc"""' 

TrlsK2-carboxyethyl)phospWne monomodified derivative bismodified derivative 

20 When a carboxy group is reacted with a carbodiimide in the absence of 

a nucleophile (the amine in this case) the adduct between the carbodiimide and the 
carboxy group may rearrange to form a stable N-acylurea. If the carbodiimide 
contains a fluorphore, the resultant phosphine will then carry that fluorophore as 
shown in the following reaction scheme: 

wherein, M 1 and M 2 are independently O, NH, NR, S. 

R 1 and R 2 are mass tags, fluorescent tags, radioactive tags, ion 

trap tags or combinations thereof. 
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O NR 1 
II II 

R 1 N=C=NR 2 + P[(CH 2 ) 2 COOH]3 [HOOC(CH 2 ) 2 ] 2 PCOG-NHR 2 -| 

O O 

II ,\\ , 
[HOOC[CH 2 )2] 2 PCH(R 1 )CNHR2« » 



Amino group-containing fluorophores such as fluoresceinyl glycine amide (5- 
(aminoacetamido)fluorescein, 7-amino-4-methylcoumarin, 2-aminoacridone, 5- 
5 aminofluorescein, 1-pyrenemethylamine and 5-aminoeosin may be used to prepare 
the labeled phosphines of this method. Amino derivatives of lucifer yellow and 
Cascade Blue may also be used as can amino derivatives of biotin. In addition, 
hydrazine derivatives such as rhodamine and Texas Red hydrazine may be useful in 
this method. 

10 Fluorescent diazoalkanes, such as, without limitation, 1- 

pyrenyldiazomethane, may also be used to form esters with TCEP. 

Fluorescent alkyl halides may also react with the anion of the carboxy group, 
i.e., the C(0)0" group, to form esters. Among the halides which might be used are, 
without limitation, panacyl bromide, 3-bromoacetyl-7-diethylaminocoumarin, 6- 
15 bromoacetyl-2-diethylaminonaphthalene, 5-bromomethylfluorescein, BODIPY® 
493/503 methyl bromide, monobromobimanes and iodoacetamides such as 
coumarin iodoacetamide may serve as effective label-carrying moieties which will 
covalently bond with TCEP. 

Naphthalimide sulfonate ester reacts rapidly with the anions of carboxylic 
20 acids in acetonitrile to give adducts which are detectable by absorption at 259 nm 
down to 100 femtomoles and by fluorescence at 394 nm down to fourfemtomoles. 

There are, furthermore, countless amine-reactive fluorescent probes available 
and it is possible to covert TCEP into an primary amine by the following reaction: 
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(CH 3 ) 3 CO-C-NH(CH 2 ) n NH 2 + P[(CH 2 ) 2 COOH] 3 EPAC » 



O O 
(CH3) 3 CO-C-NH(CH 2 ) n NH-C-CH 3 ) 2 P[)CH 2 ) 2 COOH) 2 ur 3 uuun , , 

O 
II 

H 2 N(CH 2 ) n NH-C(CH 2 ) 2 P[(CH 2 ) 2 COOH] 2 

The aminophospine can then be used to form label-containing aminophosphines for 

use in the cleavage/labeling method described herein. 
5 The above dyes and procedures for covalently bonding them to TCEP are but 

a few examples of the possible adducts which can be formed. A valuable source of 

additional such reagents and procedures is the catalog of Molecular Probes, Inc. 

Based on the disclosures herein and resources such as the Molecular Probes 

catalog, many others way to modify phosphines, in particular TCEP, will be apparent 
10 to those skilled in the art. Those other ways to modify phosphines for use in the 

incorporation of labels into polynucleotide fragments during chemical cleavage of the 

polynucleotide are within the scope of this invention. 

2. Cytosine (4) can be replaced with 5-azacytosine (5). 5-Azacytosine is 

likewise efficiently incorporated into polynucleotides by enzyme catalyzed 
15 polymerization. 5-Azacytosine is susceptible to cleavage by chemical base, 

particularly aqueous base, such as aqueous piperidine or aqueous sodium 

hydroxide. Verdine. et al.. Biochemistry . 1992,31:11265-11273; 
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3(a). Guanine (6) can be replaced with 7-methylguanine (7) and can likewise 
be readily incorporated into polynucleotides by polymerases (Verdine, et al., JACS . 
1991, 113:5104-5106 ) and is susceptible to attack by chemical base, such as, without 
limitation, aqueous piperidine (Siebenlist, et al., Proc. Natl. Acad. Sci. USA . 1980, 
5 77:122); or, 




1 0 3(b). Gupta and Kool, Chem. Commun. 1997, pp 1425 - 26 have demonstrated 

that N 6 -allyl-dideoxyadenine, when incorporated into a DNA strand, will cleave on 
treatment with a mild electrophile, E + , in their case iodine. The proposed mechanism 
is shown in (Scheme 1 ): 
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> 



o o 



electrophile (E*) 




DNA 



0" 



DNA 
6 



o- P b 



DNA 



o-. 



o- O 



glycosidic bond cleavage 0 



N N" 
OH 



DNA 



O O 



DNA 



DNA 
6 



aq. piperidine 



O" OH 



OH 

o- O 



DNA 



15 Scheme 1 

A similar procedure might be employed with guanine using the previously unreported 
2-allylaminoguanine derivative 8, which can be prepared by the procedure shown in 
(Scheme 2): 



20 
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Scheme 2 

Other ways to synthesize compound 8 will become apparent based on the disclosures 
herein; such syntheses are considered within the spirit and scope of this invention. 
The incorporation of the resulting N 2 -allylguanosine triphosphate into a polynucleotide 
strand should be susceptible to cleavage in a similar manner to the N 6 -allyladenine 
nucleotide of Gupta, i.e. by the mechanism shown in (Scheme 3): 



DNA 



6% 



N N NH 



NH 



electrophile (E*) 




0" O 



DNA 



°'>. 

ob 



□NA 

6 



NH 



H N " 



OH 



0:p N 
0' o 



DNA 



glycoside bond o 
cleavage o:p 

or o 

DNA 



DNA 

6 



aq. piperidine 



0' OH 



OH 

o=A 
o- o 



DNA 



Scheme 3 
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4. Either thymine (9) or uracil (10) may be replaced with 5-hydroxyuracil 
(11) (Verdine, JACS, 1991, 113:5104). As with the above modified bases, the 
nucleotide prepared from 5-hydroxyuracil can also be incorporated into a 
polynucleotide by enzyme-catalyzed polymerization. Verdine, et al., JACS . 1993, 
5 1 1 5:374-375. Specific cleavage is accomplished by first treating the 5-hydroxyuracil 
with an oxidizing agent, for instance, aqueous permanganate, and then with a 
chemical base such as, without limitation, aqueous piperidine ( Verdine . ibid) . 




11 

5. Pyrimidines substituted at the 5-position with an electron withdrawing 
15 group such as, without limitation, nitro, halo or cyano, should be susceptible to 
nucleophilic attack at the 6-position followed by base-catalyzed ring opening and 
subsequent degradation of the phosphate ester linkage. An example, which is not to 
be construed as limiting the scope of this technique in any manner, is shown in 
(Scheme 4) using 5-substituted cytidine. If the cleavage is carried out in the 
20 presence of tris(carboxyethyl)phosphine (TCEP), the adduct 10 may be obtained 
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and, if the TCEP is functionalized with an appropriate moiety (q.v. infra), labeled 
polynucleotide fragments may be obtained. 
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(2) Sugar modification and cleavage 

Modification of the sugar portion of a nucleotide may also afford a modified 
polynucleotide which is susceptible to selective cleavage at the site(s) of 
10 incorporation of such modification. In general, the sugar is modified to include one 
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or more functional groups which renders the 3' and/or the 5' phosphate ester linkage 
more labile; i.e. susceptible to cleavage, than the 3' and/or 5' phosphate ester 
linkage of a natural nucleotide. The following are examples, without limitation, of 
such sugar modifications. Other sugar modifications will become readily apparent to 
5 those skilled in the art in light of the disclosures herein and are therefore deemed to 
be within the scope of this invention. In the formulas which follow, B and B' refer to 
any base and they may be the same or different. 

1 . In a deoxyribose-based polynucleotide, replacement of one or more of 
the deoxyribonucleosides with a ribose analog; e.g., without limitation, substituting 
10 adenosine (12) for deoxyadenosine (13) renders the resultant modified 

polynucleotide susceptible to selective cleavage by chemical bases such as, without 
limitation, aqueous sodium hydroxide or concentrated ammonium hydroxide, at each 
point of occurrence of adenosine in the modified polynucleotide (Scheme 5); 
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13 

2. A 2-ketosugar (14, synthesis: J ACS . 1967, 89:2697) may be 
5 substituted for the sugar of a deoxynucleotide; upon treatment with chemical base 
such as, without limitation, aqueous hydroxide, the keto group equilibrates with its 
ketal form (15) which then attacks the phosphate ester linkage effecting cleavage 
(Scheme 6); 




Scheme 6 

3. A deoxyribose nucleotide can be replaced with its arabinose analog; 
i.e., a sugar containing a 2"-hydroxy group (16). Again, treatment with mild (dilute 
aqueous) chemical base effects the intermolecular displacement of a phosphate 
15 ester linkage resulting in cleavage of the polynucleotide (Scheme 7): 
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16 



Scheme 7 

4. A deoxyribose nucleotide can be replaced by its 4'-hydroxymethyl 
5 analog (17, synthesis: Helv. Chim. Acta . 1966, 79:1980) which, on treatment with mild 
chemical base such as, without limitation, dilute aqueous hydroxide, likewise displaces 
a phosphate ester linkage causing cleavage of the polynucleotide as shown in 
(Scheme 8): 




5. A deoxyribose nucleotide can be replaced by its 4'-hydroxy carbocyclic 
analog; i.e., a 4-hydroxymethylcyclopenane derivative (18) which, on treatment with 
1 5 aqueous base, results in the cleavage of the polynucleotide at a phosphate ester 
linkage as shown in (Scheme 9): 
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18 Scheme 9 

5 

6. A sugar ring may be replaced with its carbocyclic analog which is 
further substituted with a hydroxyl group (19). Depending on the stereochemical 
positioning of the hydroxyl group on the ring, either a 3' or a 5' phosphate ester 
linkage can be selectively cleaved on treatment with mild chemical base (Scheme 
10 10): 




Scheme 10 



WO 00/18967 



105 



PCT/US99/22988 



7. In each of examples 1,3,4,5 and 6, above, the hydroxy group which 
attacks the phosphate ester cleavage may be replaced with an amino group (-NH 2 ). 
The amino group may be generated jn sjtu from the corresponding azidosugar by 
treatment with tris(2-carboxyethyl)-phosphine (TCEP) after the azide-modified 
5 polynucleotide has been formed (Scheme 1 1 ). The amino group, once formed, 
spontaneously attacks the phosphate ester linkage resulting in cleavage. 




Scheme 11 

10 8. A sugar may be substituted with a functional group which is capable of 

generating a free radical such as, without limitation, a phenylselenyl (PhSe-) or a t- 
butyl ester group ('BuC(=0)-) ( Anqew. Chem. Int. Ed. Engl . 1993, 32:1742-43). 
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Treatment of the modified sugar with ultraviolet light under anaerobic conditions 
results in the formation of a C 4 ' radical whose fragmentation causes the excision of 
the modified nucleotide and thereby the cleavage of the polynucleotide at the 
modified nucleotide (Scheme 12). The free radicals may be generated either prior to 
5 or during the laser desorption/ionization process of MALDI mass analysis. Modified 
nucleotides with other photolabile 4' substitutents such as, without limitation, 2- 
nitrobenzyl groups or 3-nitrophenyl groups ( Synthesis . 1980, 1-26) and bromo or 
iodo groups may also be used as precursors to form a C 4 ' radical. 
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9. An electron-withdrawing group may be incorporated into the sugar 
such that the nucleotide is either rendered susceptible to p-elimination (when W is 
cyano (a "cyanosugar" 20)) or the oxyanion formed by the hydrolysis of the 3'- 
phosphate ester linkage is stabilized and thus hydrolysis with mild chemical base will 
5 be preferred at the modified sugar; such electron-withdrawing groups include, 
without limitation, cyano 

(-ON), nitro (-N0 2 ), halo (in particular, fluoro), azido (-N 3 ) or methoxy (-OCH 3 ) (Scheme 
13): 




Scheme 13 

A cyano sugar can be prepared by a number of approaches, one of which is shown 
in (Scheme 14). Other methods will no doubt be apparent to those skilled in the art 
based on the disclosures herein; such alternate approaches to cyano (or other 
1 5 electron withdrawing group substituted sugars) are within the spirit and scope of this 
invention. 
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1 0. The ring oxygen of a sugar may be replaced with another atom; e.g., 
10 without limitation, a nitrogen to form a pyrrole ring (21). Or, another heteroatom may 
be placed in the sugar ring in place of one of the ring carbon atoms; for example, 
without limitation, a nitrogen atom to form an oxazole ring (22). In either case, the 
purpose of the different or additional heteroatom is to render the phosphate ester 
linkage of the resulting non-natural nucleotide more labile than that of the natural 
15 nucleotide (Scheme 15): 
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11. A group such as, without limitation, a mercapto group may be 
5 incorporated at the 2" position of a sugar ring which group, on treatment with mild 
chemical base, forms a ring by elimination of the 3'-phosphate ester (Scheme 16). 
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12. A keto group can be incorporated at the 5' position such that the 
resulting phosphate has the lability of an anhydride, i.e., structure 23. A nucleotide 
triphosphate such as 23 may be synthesized by the procedure shown in (Scheme 
5 1 7). It is recognized that other routes to such nucleotide triphosphates may become 
apparent to those skilled in the art based on the disclosures herein; such syntheses 
are within the spirit and scope of this invention. 
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Polynucleotides into which nucleotide triphosphates of structure 23 have been 
incorporated should, like analogous mixed anhydrides, be susceptible to alkaline 
hydrolysis as shown in (Scheme 18): 
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Scheme 18 
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13. The phosphate ester linkage could be turned into the relatively more 
labile enol ester linkage by the incorporation of a double bond at the 5' position, that 
is, a nucleotide triphosphate of structure 24 could be used. A nucleotide 
triphosphate of structure 24 can be prepared by the procedure shown in (Scheme 
19). It is again understood that other ways to produce structure 24 may be apparent 
to those skilled in the art based on the disclosures herein, as before, these alternate 
syntheses are well within the spirit and scope of this invention. 
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The enol ester would be susceptible to alkaline cleavage according to (Scheme 20). 
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5 

Scheme 20 

14. Difluoro substitution at the 5' position would increase the lability of the 
phosphate ester linkage and would also push the reaction to completion by virtue of the 

1 0 hydrolysis of the intermediate difluorohydroxy group to an acid group as shown in 
(Scheme 22). The dihalo derivative could be synthesized by the procedure shown in 
(Scheme 21 ). Once again, the route shown in (Scheme 21 ) is not the only way possible 
to make the difluoronucleotide triphosphate. However, as above, these other routes 
would be apparent based on the disclosures herein and would be within the spirit and 

1 5 scope of this invention. 
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(3) Phosphate ester modification and cleavage 

Modification of the phosphate ester of a nucleotide results in modification of the 
phosphodiester linkages between the 3'-hydroxy group of one nucleotide and the 5'- 

1 0 hydroxy group of the adjacent nucleotide such that one or the other of the modified 3' or 5' 
phosphate ester linkages is rendered substantially more susceptible to cleavage than the 
corresponding unmodified linkage. Since the phosphodiester linkage forms the backbone 
of a polynucleotide, this modification method will, herein, be referred to alternatively as 
"backbone modification." The following are non-limiting examples of backbone 

1 5 modification. Other such modifications will become apparent to those skilled in the art 
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based on the disclosures herein and therefore are deemed to be within the scope of this 
invention. 

1 . Replacement of an oxygen in the phosphate ester linkage with a sulfur; i.e., 
creation of a phosphorothiolate linkage ( 25a . 25b . 25c) which either directly on treatment 
5 with mild base (Schemes 23(a) and 23(b)) or on treatment with an alkylating agent, such 
as, for instance, methyl iodide, followed by treatment with strong non-aqueous organic 
base, for example, methoxide (Scheme 23(c)), results in the selective cleavage of the 
phosphothioester linkage. Alternatively, phosphorothiolate linkages such as those in 
Formula 14 may also be selectively cleaved through laser photolysis during MALDI mass 
10 analysis. This in-source fragmentation procedure Mntemat'l J. of Mass Spec, and Ion 
Process . 1997, 169/170:331-350) consolidates polynucleotide cleavage and analysis into 
one step; 




15 25a Scheme 23(a) 




25b 



Scheme 23(b) 
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Scheme 23(c) 

5 2. Replacement of an oxygen in the phosphate ester linkage with a 

nitrogen creating a phosphoramidate linkage (26) which, on treatment with, for 
instance and without limitation, dilute aqueous acid, will result in selective cleavage 
(Scheme 24); 
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Scheme 24 
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3. Replacement of one of the free oxygen atoms attached to the 
phosphorus of the phosphate backbone with an alkyl group, such as, without 
limitation, a methyl group, to form a methylphosphonate linkage, which, on treatment 
with strong non-aqueous organic base, such as without limitation, methoxide, will 
likewise result in selective cleavage (Scheme 25). 




Scheme 25 



4. Alkylation of the free oxyanion of a phosphate ester linkage with an 
alkyl group such as, without limitation, a methyl group will, on treatment with strong 
non-aqueous organic base such as without limitation, methoxide, result in the 
selective cleavage of the resulting alkylphosphorotriester linkage (Scheme 26). 
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Scheme 26 

5. Treatment of a phosphorothioate with p-mercaptoethanol in a strong, 
base such as, without limitation, methanolic sodium methoxide, in which the 
mercaptoethanol exists primarily as the disulfide, could result in the formation of a 
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mixed disulfide, which would then degrade, with or without rearrangement, to give 
the cleavage products shown in (Scheme 27). 




o 



5 Scheme 27 

(4) Dinucleotide modification and cleavage 

The previous substitutions are all single substitutions; that is, one modified 
nucleotide is substituted for one natural nucleotide wherever the natural nucleotide 
occurs in the target polynucleotide or, if desired, at a fraction of such sites. In an 
10 additional aspect of this invention, multiple substitutions may be used. That is, two 
or more different modified nucleotides may be substituted for two or more different 
natural nucleotides, respectively, wherever the natural nucleotides occur in a subject 
polynucleotide. The modified nucleotides and cleavage conditions are selected 
such that, under the proper cleavage conditions, they do not individually confer 
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selective cleavage properties on a polynucleotide. When, however, the proper 
cleavage conditions are applied and the modified nucleotides are incorporated into 
the polynucleotide in a particular spatial relationship to one another, they interact to 
jointly render the polynucleotide selectively cleavable. Preferably, two modified 
5 nucleotides are substituted for two natural nucleotides in a polynucleotide, thus, this 
method is referred to herein as "dinucleotide modification." It is important to note 
that, individually, each of the two modified nucleotides may elecit specific and 
selective cleavage of a polynucleotide albeit under quite different, typically more 
vigorous chemical conditions. 

10 As used herein, "spatial relationship" refers to the 3-dimensional relationship 

between two or more modified nucleotides after substitution into a polynucleotide. In 
a preferred embodiment of this invention, two modified nucleotides must be 
contiguous in a modified polynucleotide in order to impart altered cleavage 
properties on the modified polynucleotide. By employing two modified nucleotides 

15 in this manner, and then cleaving the modified polynucleotide, the relationship 
between two natural nucleotides in a target polynucleotide can be established 
depending on the nature of the multiple substitution selected. That is, the natural 
nucleotides being replaced would also have been adjacent to one another in the 
natural nucleotide. For example, without limitation, if a modified A and modified G 

20 are replaced at every point of occurrence of the corresponding natural A and natural 
G, respectively, the modified polynucleotide will be rendered selectively cleavable 
only where the natural A and G were directly adjacent, i.e., AG or GA (but not both), 
in the naturally-occurring polynucleotide. As shown below, proper choice of the 
modified polynucleotides will also reveal the exact relationship of the nucleotides, 

25 i.e., in the example above, whether the nucleotide sequence in the natural 

polynucleotide was AG or GA. The following are non-limiting examples of multiple 
substitutions. Other multiple substitutions will become apparent to those skilled in 
the art based on the disclosures set forth herein and therefore are deemed to be 
within the scope of this invention. 

30 1 . One modified nucleotide may contain a functional group capable of 

effecting nucleophilic substitution while the companion modified nucleotide is 
modified so as to render it a selective leaving group. The nucleophile and the 
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leaving group may be in a 5'-3' orientation or in a 3-5' orientation with respect to one 
another. A non-limiting example of this is shown in (Scheme 28). The 2' or 2" 
hydroxy group on one modified nucleotide, when treated with mild chemical base 
becomes a good nucleophile. The other modified nucleotide contains a 3' or 5' 
5 thiohydroxy (-SH) group which forms a 3' or 5* phosphorothioate linkage when 
incorporated into the modified polynucleotide. This phosphorothiolate linkage is 
selectively more labile than a normal phosphodiester linkage. When treated with 
mild base, the oxyanion formed from the hydroxy group of one modified nucleotide 
selectively displaces the thiophosphate linkage to the other modified nucleotide 
10 resulting in cleavage. As shown in Scheme 28(a) and 2(b), depending on the 
stereochemical relationship between the hydroxy group and the thiophosphate 
linkage, cleavage will occur either to the 3" or the 5' side of the hydroxy-containing 
modified nucleotide. Thus, the exact relationship of the natural nucleotides in the 
naturally-occurring polynucleotide is revealed. 




Scheme 28(a) 
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Scheme 28(b) 



5 2 (a). If one modified nucleotide contains a 3' or 5' amino (-NH 2 ) group and 

the other modified nucleotide contains a 5' or 3' hydroxy group, respectively, 
treatment of the resulting phosphoroamidate-linked polynucleotide with mild acid 
results in the protonation of the amino group of the phosphoroamidate linkage which 
then becomes a very good leaving group. Once again, depending on the spatial 
10 relationship between the hydroxy group of one modified nucleotide and the amino 
group of the other modified nucleotide, the exact relationship of the nucleotides in 
the naturally-occurring polynucleotide can be determined as shown in Formulas 
29(a) and 29(b). 
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Scheme 29(b) 

Dinucleotide cleavage of a ribonucleotide/ 5'- aminonucleotide 5'-3' linkage is 
presently a preferred embodiment of this invention. Examples of this method are 
5 shown in Figures 21 -26. 

2(b). When the amino group of the modified nucleotide is 5', a 
ribonucleotide/5'-amino 2',5'-dideoxynucleotide pair may be cleaved during the 
polymerization process. For example, without limitation, cleavage occurs during the 
incorporation of adenine ribonucleotide and 5'-aminodideoxythymine nucleotide into a 

10 polynucleotide using a combination of wild type Klenow (exo-) and mutant E710A 
Klenow (exo-) polymerases. E710A is a mutant Klenow (exo-) polymerase in which a 
glutamate at residue 710 has been replace by alanine. The E710A mutant is more 
efficient at incorporating both ribonucleotides and deoxyribonucleotides into a single 
nascent polynucleotide strand than Klenow (exo-). Other polymerases with similar 

15 properties will be apparent to those skilled in the art based on the disclosures herein 
and their use for the incorporation of ribonucleotide and 5'-amino-2',5'- 
dideoxynucleotide into a polynucleotide with subsequent cleavage during the 
polymerization reaction is within the scope of this invention. 

When a 5'-end radiolabeled primer was extended using a mixture of Klenow 

20 (exo-) and E710A Klenow (exo-), only one fragment (the 5'-end fragment) was 

observed indicating complete cleavage at the ribonucleotide-5'-aminonucleotide sites. 
We have shown (Figs. 21 - 26) that the polymerization and cleavage occur in the 
same step. That is, cleavage is induced during protein-DNA contact. The figures show 
that the polymerases continue to extend the template even after cleavage which also 

25 suggests that the cleavage is the result of protein-DNA contact. While USB brand 
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Klenow polymerase (Amersham) was also able to incorporate the two nucleotides, it 
was not as efficient as the mixture of polymerases and, furthermore, multiple product 
bands were observed indicating incomplete cleavage at the AT sites. 

The above is, of course, a specific example of a general concept. That is, other 
5 wild type polymerases, mutant polymerases or combinations thereof should likewise be 
capable of cleaving, or facilitating cleavage of, modified nucleotides ordinucleotides 
during the polymerization procedure. The procedure for determining the exact 
combinations of polymerase(s) and nucleotide modifications that result in cleavage, 
based on the disclosures herein, will be apparent to those skilled in the art. For 

10 instance, as is described below, it may be useful to generate a library of mutant 

polymerases and select specifically for those which induce dinucleotide cleavage. Thus, 
a polymerase or a combination of polymerases which cause the cleavage of a forming 
modified polynucleotide during the polymerization process is yet another aspect of this 
invention, as are the method of cleaving a modified polynucleotide during the 

15 polymerization process using a polymerase or combination of polymerases and the 
modified nucleotide(s) necessary for the cleavage to occur. 

3. An electron-withdrawing group can be placed on a sugar carbon adjacent 
to the carbon which is bonded to the hydroxy group participating in the ester linkage of a 
methylphosphonate (Scheme 30(a)) or methylphosphotriester (Scheme 30(b)) backbone. 

20 This will result in increased stability of the oxyanion formed when the phosphate group 
is hydrolyzed with mild chemical base (Scheme 30) and thus selective hydrolysis of 
those phosphate ester linkages compared to phosphate ester linkages not adjacent to 
such hydroxy groups. 
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Scheme 30(a) 
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Scheme 30(b) 

4. An electron-withdrawing group can be placed on the 4' carbon of a 
5 nucleotide which is linked through its 5'-hydroxy group to the 3'-hydroxy group of an 
adjacent ribonucletoide. Treatment with dilute base will result in cleavage as shown 
in (Scheme 31). 




Scheme 31 

10 

5. A 2' or 4' leaving group in a sugar may be susceptible to attack by the 
sulfur of a phosphorothioate as shown in (Schemes 32 and 33) to afford the desired 
cleavage: 
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Scheme 32 
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6. Ethylene sulfide could effect the cleavage of a 2' fluoro derivative of a 
sugar next to a phosphorothioate according to (Scheme 34): 




p-Mercaptoethanol or a similar reagent may be substituted for the ethylene sulfide. 

7. A phosphorothioate might coordinate with a metal oxidant such as, 
without limitation, Cu"or Fe m , which would be held in close proximity to the 2' 
1 0 hydroxy group of an adjacent ribonucleotide. Selective oxidation of the 2' hydroxy 
group to a ketone should render the adjacent phosphate ester linkage more 
susceptible to cleavage under basic conditions than the corresponding 
ribonucleotides or deoxyribonucleotides as shown in (Scheme 35): 
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Scheme 35 

The preceding cleavage reactions may be carried out in such a manner as to 
cause cleavage at substantially all points of occurrence of the modified nucleotide 
5 or, in the case of multiple substitutions, all points of occurrence of two or more 
modified nucleotides in the proper spatial relationship. On the other hand, by 
controlling the amount of cleaving reagent and the reaction conditions, cleavage can 
be partial; i.e., cleavage will occur at only a fraction of the points of occurrence of a 
modified nucleotide or pairs of modified nucleotides. 
10 B. Fragmenting modified polynucleotides in mass spectrometers 

The preceding discussion relates to chemical methods for cleaving 
polynucleotides at sites where modified nucleotides have been incorporated. However, 
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besides fragmenting polynucleotide molecules chemically in solution, it is a further 
aspect of this invention that fragmentation is accomplished within a mass spectrometer 
using chemical or physical means. Further, by manipulating the conditions within the 
mass spectrometer, the extent of fragmentation can be controlled. The ability to 
5 control degree of fragmentation of chemically modified oligonucleotides can be very 
useful in determining relationships between adjacent sequences. This is because, 
while mass spectrometric (MS) analysis of a completely cleaved polynucleotide 
provides the masses and therefore the nucleotide content of each fragment 
polynucleotide, determining the order in which these fragment polynucleotides are 

10 linked together in the original (analyte) polynucleotide is a difficult problem. By relaxing 
the stringency of cleavage one can generate fragments that correspond to two or more 
fragments from the complete cleavage set. The mass of these compound fragments 
provides the information that permits the inference that the two component fragments 
are adjacent in the original polynucleotide. By determining that multiple different pairs 

15 or triplets of complete cleavage fragments are adjacent to each other, eventually a 
much larger sequence can be pieced together than if one must rely solely on analysis 
of complete cleavage fragments. The ability to control the conditions of fragmentation 
by manipulation in the mass spectrometer is particularly advantageous because, in 
contrast to the iterative generation and subsequent testing of partial cleavages in a test 

20 tube, the effect of various partial cleavage conditions can be directly observed in real 
time and instantaneously manipulated to provide the optimal partial cleavage data 
set(s). For some purposes, use of several partial cleavage conditions may be very 
useful as successive levels of partial cleavage will provide a cumulative picture of the 
relationships between ever larger fragments. Specific mechanisms for fragmentation 

25 of modified polynucleotides are described below. 

First, by choice of appropriate ionization methods, fragmentation can be 
induced during the ionization process. Alternatively, in the tandem mass 
spectrometry (MS/MS) approach, ions with mass-to-charge ratios (m/z) of interest 
can be selected and then activated by a variety of procedures including collision with 

30 molecules, ions or electrons, or the absorption of photons of various wavelength, 
leading to the fragmentation of the ions. In one aspect, ionization and fragmentation 
of the polynucleotide molecules can be achieved with fast atom bombardment 
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(FAB). In this approach, modified polynucleotide molecules are dissolved in a liquid 
matrix such as glycerol, thioglycerol, or other glycerol analogs. The solution is 
deposited on a metallic surface. Particles with thousands of electron volts of kinetic 
energy are directed at the liquid droplet. Depending on the modification of the 
5 polynucleotides, partial fragmentation or complete fragmentation at every modified 
nucleotide can be achieved. 

In another aspect, ionization and fragmentation can be effected by matrix- 
assisted laser desorption ionization mass spectrometry (MALDI-MS). In MALDI-MS 
a solution of modified polynucleotide molecules is mixed with a matrix solution, e.g., 

10 3-hydroxypicoiinic acid in aqueous solution. An aliquot of the mixture is deposited 
on a solid support, typically a metallic surface with or without modification. Lasers, 
preferably with wavelength between 3 \im and 10.6 Fm, are used to irradiate the 
modified polynucleotide/matrix mixture. To analyze in-source fragmentation (ISF) 
products, delayed extraction can be employed. To analyze post-source decay 

15 (PSD) products, an ion reflector can be employed. 

In another approach, ionization and fragmentation can be accomplished by 
electrospray ionization (ESI). In this procedure, the solution of modified DNA is 
sprayed through the orifice of a needle with a few kilovolts of voltage applied. 
Fragmentation of the modified polynucleotide molecules would occur during the 

20 desolvation process in the nozzle-skimmer (NS) region. The degree of the 

fragmentation will depend on the nature of the modification as well as factors such 
the voltage between the nozzle and skimmer, the flow rate as well as the 
temperature of the drying gas. If a capillary is used to assist the desolvation, then it 
is the voltage between the exit of the capillary and the skimmer and the temperature 

25 of the capillary that need to be controlled to achieved the desired degree of 
fragmentation. 

In yet another technique, modified polynucleotide molecules can be 
selectively activated and dissociated. Activation can be accomplished by 
accelerating precursor ions to a kinetic energy of a few hundred to a few million 
30 electron volts and then causing them to collide with neutral molecules, preferably of 
noble gas. In the collision some of the kinetic energy of the precursor ions is 
converted into internal energy and causes fragmentation. Activation can be also 
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accomplished by allowing accelerated precursor ions to collide onto a conductive or 
semi-conductive surface. Activation can also be accomplished by allowing 
accelerated precursor ions to collide with ions of opposite polarity. In another 
approach, activation can be accomplished by electron capturing. In this technique, 
5 the precursor ions are allowed to collide with thermalized electrons. Activation can 
also be accomplished by irradiating the precursor ions with photons of various 
wavelengths, preferably in the range of 193 nm to 10.6 urn. Activation can also be 
accomplished by heating vacuum chambers for trapped ions; the heating of vacuum 
chamber walls causes blackbody IR irradiation (Williams, E. R., Anal. Chem .. 1998, 

10 70:179A-185A). The presence of modified nucleotides in a polynucleotide could 
also increase the rate constant of the fragmentation reaction, shortening the 10- 
1000 second duration required by the blackbody IR irradiation approach for 
unmodified polynucleotides. 

As noted previously, tandem mass spectrometry is another tool that may be 

15 beneficially employed with the methods of this invention. In tandem mass 

spectrometry, precursor ions with m/z of interest are selected and subjected to 
activation. Depending on the activation technique employed, some or all of the 
precursor ions can be fragmented to give product ions. When this is done inside a 
suitable mass spectrometer (e.g., Fourier-transform ion cyclotron resonance mass 

20 spectrometer and ion trap mass spectrometers), the product ions with m/z of interest 
can be further selected and subjected to activation and fragmentation, giving more 
product ions. The mass of both precursor and product ions can be determined. 

To control the degree of fragmentation at different stage of activation, two or 
more different types of modified nucleotides which, for purposes of discussion will be 

25 called Type I and Type II, with different sensitivity to different activation techniques 
could be incorporated (complete replacement of the natural nucleotide) into a target 
polynucleotide. Such a polynucleotide can be fragmented with high efficiency by 
type I activation technique at every position where type I modified nucleotides are 
incorporated. The resulting fragment ions, which still contain type II modified 

30 nucleotides can then be selected and fragmented by a type II activation technique to 
generate a set of sub-fragments from which nucleotide content can be more readily 
inferred. Such an approach can be useful for variance detection. For example, a 
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500-mer polynucleotide can be first fragmented into 10-50 fragments using a type I 
fragmentation technique. The m/z of each fragment (when compared to the 
predicted set of fragment masses) will reveal if a variance resides in this fragment. 
Once fragments containing a variance are identified, the rest of the fragment ions 
5 are ejected from the ion trapping device, while the fragment ions of interest are 

subjected to activation. By controlling the degree of fragmentation of these fragment 
ions, a set of smaller DNA fragments can be generated, allowing the order of the 
nucleotides and the position of the variance to be determined. Compared to the 
approach involving one type of modified nucleotide and one stage fragmentation, 

10 such an approach has the advantage in that the number of experimental steps and 
the amount of data that needs to be processed is significantly reduced. Compared 
to the approach involving one type of modified nucleotide but two stages of partial 
fragmentation, this approach has the advantage in that the fragmentation efficiency 
at the second stage is more controllable, hence reducing the chance of sequence 

15 gaps. 

Although the aforementioned schemes of activation can be applied to all 
kinds of mass spectrometers, ion-trap mass spectrometers (ITMS) and Fourier- 
transform ion cyclotron resonance mass spectrometers (FT-ICRMS) are particularly 
suited for the electron capturing, photon activation, and blackbody IR irradiation 

20 approaches. 

C. Modified Nucleotide Incorporation 

Several examples of the polymerase catalyzed incorporation of a modified 
nucleotide into polynucleotides are described in the Example section, below. It may 
be, however, that one particular polymerase will not incorporate all the modified 

25 nucleotides described above, or others like them which are within the scope of this 
invention, with the same ease and efficiency. Also, while a particular polymerase 
may be capable of incorporating one modified nucleotide efficiently, it may be less 
efficient in incorporating a second modified nucleotide directly adjacent to the first 
modified nucleotide. Furthermore, currently available polymerases may not be 

30 capable of inducing or facilitating cleavage at modified nucleotides or nucleotide 
pairs, an extremely convenient way to achieve cleavage (see above). There are, 
however, several approaches to acquiring polymerases that are capable of 
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incorporating the modified nucleotides and contiguous pairs of modified nucleotides 
of this invention and, potentially, inducing or facilitating specific cleavage at that 
modified nucleotide or those modified nucleotides. 

One approach to finding polymerases with the proper capabilities is to take 
5 advantage of the diversity inherent among naturally-occurring polymerases 
including, without limitation, RNA polymerases, DNA polymerases and reverse 
transcriptases. Naturally-occurring polymerases are known to have differing 
affinities for non-natural nucleotides and it is likely that a natural polymerase which 
will perform the desired incorporation can be identified. In some cases, use of a 

10 mixture of two or more naturally- occurring polymerases having different properties 
regarding the incorporation of one or more non-natural nucleotides may be 
advantageous. For example, W. Barnes has reported ( Proc. Natl. Acad. Sci. USA . 
1994, 91:2216-2220) the use of two polymerases, an exonuclease-free N-terminal 
deletion mutant of Taq DNA polymerase and a thermostable DNA polymerase 

15 having 3'-exonuclease activity, to achieve improved polymerization of long DNA 
templates. Naturally occurring polymerases from thermophilic organisms are 
preferred polymerases for applications in which amplification by thermal cycling, 
e.g., PCR, is the most convenient way to produce modified polynucleotides. 

Another approach is to employ current knowledge of polymerase structure- 

20 function relationships (see, e.g., Delarue, M., et al., Protein Engineering . 1990, 
3:461-467; Joyce, C. M., Proc. Natl. Acad. Sci. USA . 1997, 94:1619-1622) to 
identify or aid in the rational design of a polymerase which can accomplish a 
particular modified nucleotide incorporation. For example, the amino acid residues 
of DNA polymerases that provide specificity for deoxyribo-NTPs (dNTPs, deoxyribo 

25 Nucleotide Triphosphates), while excluding ribo-NTPs (rNTPs), have been examined 
in some detail. Phenylalanine residue 155 or Moloney Murine Leukemia Virus 
reverse transcriptase appears to provide a steric barrier that blocks entry of ribo- 
NTPs. A similar role is played by phenylalanine residue 762 of the Klenow 
Fragment of E. Coli DNA polymerase I, and tyrosine residue 1 15 of HIV-1 reverse 

30 transcriptase. Mutation of this latter amino acid, or its equivalent, in several different 
polymerases has the effect of altering polymerase fidelity and sensitivity to 
nucleotide inhibitors. 
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The corresponding site in RNA polymerases has also been investigated and 
appears to play a similar role in discriminating ribo- from deoxyribo- nucleotides. For 
example, it has been shown that mutation of tyrosine 639 of T7 RNA polymerase to 
phenylalanine reduces the specificity of the polymerase for rNTPs by about 20-fold 
5 and almost eliminates the K m difference between rNTPs and dNTPs. The result is 
that the mutant T7 RNA polymerase can polymerize a mixed dNTP/rNTP chain. 
See, e.g., Huang, Y., Biochemistry . 1997, 36:13718-13728. These results illustrate 
the use of structure-function information in the design of polymerases that will readily 
incorporate one or more modified nucleotides. 

10 In addition, chemical modification or site directed mutagenesis of specific 

amino acids or genetic engineering can be used to create truncated, mutant or 
chimeric polymerases with particular properties. For example, chemical modification 
has been used to modify T7 DNA polymerase (Sequenase®, Amersham) to 
increase its processivity and affinity for non-natural nucleotides (Tabor, S., et al., 

15 Proc. Natl. Acad. Sci. USA . 1987, 84:4767-4771). Likewise, site directed 

mutagenesis has been employed to examine how E. coli DNA polymerase I (Klenow 
fragment) distinguishes between deoxy and dideoxynucleotides (Astake, M., et al., 1 
Mol. Biol. . 1998, 278:147-165). 

Furthermore, development of a polymerase with optimal characteristics can 

20 be accomplished by random mutagenesis of one or more known polymerases 
coupled with an assay which manifests the desired characteristics in the mutated 
polymerase. A particularly useful procedure for performing such mutagenesis is 
called "DNA shuffling " (see Harayama, S., Trends BiotechnoL 1998, 16:76-82). For 
example, using only three rounds of DNA shuffling and assaying for p-lactamase 

25 activity, a variant with 16,000-fold higher resistance to the antibiotic cefotaxime than 
the wild-type gene was created (Stemmer, W. P. C, Nature . 1994, 370:389-391). 

A novel procedure, which is a further aspect of this invention, for creating and 
selecting polymerases capable of efficiently incorporating a modified nucleotide or 
contiguous pair of modified polynucleotides of this invention is described in the 

30 Examples section, below. 
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D. Fragment Analysis 

Once a modified nucleotide or nucleotides has been partially or completely 
substituted for one or more natural nucleotides in a polynucleotide and cleavage of 
the resultant modified polynucleotide has been accomplished, analysis of the 
5 fragments obtained can be performed. If the goal is complete sequencing of a 
polynucleotide, the above-mentioned partial incorporation of modified nucleotides 
into a polynucleotide or partial cleavage of a completely modified-nucleotide- 
substituted polynucleotide may be used to create fragment ladders similar to those 
obtained when using the Maxam-Gilbert or Sanger procedures. In such case, a 

10 sequencing ladder can then be constructed using slab, capillary or miniaturized gel 
electrophoresis techniques. The advantages of the method of this invention over the 
Maxam-Gilbert procedure is that the placement of the modified nucleotides in the 
modified polynucleotide is precise as is cleavage whereas post-synthesis 
modification of a full-length polynucleotide by the Maxam-Gilbert reactions is 

1 5 susceptible to error. For example, the wrong nucleotides might be modified and thus 
the wrong cleavage may occur or the intended nucleotides may not be modified at 
all such that there may be insufficient, perhaps even no cleavage where cleavage 
would be expected to occur. The advantages over the Sanger procedure are 
several. First, the full length clone can be purified after extension and prior to 

20 cleavage so that prematurely terminated fragments due to stops caused by 
polymerase error or template secondary structure can be removed before gel 
electrophoresis resulting in cleaner cleavage bands. In fact, it may not even be 
necessary to perform such clean up in that the prematurely terminated polymerase 
extension fragments themselves will be cleaved if they contain a modified nucleotide 

25 and those correctly cleavage fragments will simply augment the other fragments 
obtained from the cleavage of the full length clone (although such augmentation is 
confined to fragments shorter than the site of premature termination). Second, the 
chemical method produces equal intensity sequence ladder products in contrast to 
dye-terminator sequencing where substantial differences in the characteristics of 

30 different dye terminator molecules or in the interaction of dye modified 

dideoxynucleotides with polymerase template complexes results in an uneven signal 
intensity in the resulting sequence ladders. Such differences can lead to errors and 
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make heterozygote identification difficult. Third, the chemical methods described 
herein allow production of homogeneous sequence ladders over distances of 
multiple kb, in contrast to the Sanger chain terminating method, which generate 
usefully labeled fragments over a substantially shorter interval. This is demonstrated 
5 in Figs. 17 and 18. The production of long sequence ladders can be coupled with 
restriction endonuclease digestion to accomplish 1X sequencing of long templates. 

The utility of this approach to sequencing genomic DNA is described in Fig. 
14 and its execution in Figs. 15 and 16. These methods have particular utility in the 
sequencing of repeat-rich genomes such as, without limitation, the human genome. 

10 A particular advantage of the methods described herein for the use of mass 

spectrometry for polynucleotide sequence determination is the speed, 
reproducibility, low cost and automation associated with mass spectrometry, 
especially in comparison to gel electrophoresis. See, e.g., Fu, D. J., et al., Nature 
Biotechnology . 1998, 16:381-384. Thus, although some aspects of this invention 

15 may employ gel analysis, those that use mass spectroscopy are preferred 
embodiments. 

When detection of variance between two or more related polynucleotides is 
the goal, the ability of mass spectrometry to differentiate masses within a few or 
even one atomic mass unit (amu) of each other permits such detection without the 

20 need for determining the complete nucleotide sequences of the polynucleotides 
being compared; i.e., the masses of the oligonucleotides provide the nucleotide 
content. The use of mass spectrometry in this manner constitutes yet another 
aspect of this invention. 

This use of mass spectrometry to identify and determine the chemical nature 

25 of variances is based on the unique molecular weight characteristics of the four 
deoxynucieotides and their oligomers. 

Table 2 shows the mass differences among the four deoxynucleotide 
monophosphates. Table 3A then shows the calculated masses of all possible 2- 
mers, 3-mers, 4-mers and 5-mers by nucleotide composition alone; that is, without 

30 consideration of nucleotide order. As can be seen, only two of the 121 possible 
2mer through 5mer oligonucleotides have the same mass. Thus, the nucleotide 
composition, of all 2mers, 3mers, 4mers and all but two 5mers created by cleavage 
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of a polynucleotide can be immediately determined by mass spectrometry using an 
instrument with sufficient resolving power. For the masses in Table 3A, an 
instrument with a resolution (full width at half-maximal height) of 1500 to 2000 would 
be sufficient; mass spectrometers with resolution up to 10,000 are commercially 
5 available. However, when cleavage is performed at all sites of modified nucleotide 
substitution, it is not necessary to consider the masses of all possible 2mers, 3mers, 
4mers, etc. This is because there can be no internal occurrences of the cleavage 
nucleotide in any cleavage fragment. That is, if G is the cleavage nucleotide, then 
all resulting cleavage fragments will have 0 or 1 G, depending on the cleavage 

10 mechanism and, if it is 1 G, that G must occur at either the 3' or the 5' end of the 
fragment depending on the cleavage mechanism. Put another way, there cannot be 
a G internal to a fragment because, if there were, that fragment would necessarily be 
refragmented at the internal G. Thus, if the cleavage chemistry does leave a G on 
either end of all G-cleavage fragments, then the mass of G can be subtracted from 

15 the mass of each fragment and the resulting masses can be compared. The same 
can be done with A, C and T. Table 4 shows the masses of all 2mers through 7mers 
lacking one nucleotide. This calculation has been performed for polynucleotides up 
to 30mers and it has been shown that there are only 8 sets of isobaric 
oligonucleotides (oligonucleotides with masses within 0.01% of each other) below a 

20 mass of 5000 Da. The eight sets of isobaric oligonucleotides are shown in Table 3B. 
Inspection of Table 3B reveals that every set except Set 2 involves a polynucleotide 
with multiple G residues. Thus, cleavage at G would eliminate all isobaric masses 
except one, d(T 8 ) vs d(C 3 A 5 ) which could not be resolved by mass spectrometry with 
a resolution of 0.01%. However, either C or A cleavage would remove the latter 

25 polynucleotide. 

Table 4 shows that cleavage at A or T consistently produces fragments with 
larger mass differences between the closest possible cleavage fragments. 
Cleavage at A produces mass differences of 5, 10, 15, 20 or 25 Da between the 
closest fragments while cleavage at T affords mass differences of 8, 18 or 24 Da, 

30 albeit at the expense of a few more isobaric fragments. 
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TABLE 2 



Panel A 


dAMP 


dCMP 


dGMP 


dTMP 




Mol. wt. 


313.2 


289.2 


329.2 


304.2 




vs. dAMP 


- 


24 


16 


9 




vs. dCMP 




- 


40 


15 




vs. dGMP 






- 


25 




Panel B 


dAMP 


dCMP 


dGMP 


dTMP 


2-chlor- 
oadenineMP 


Mol. wt. 


313.2 


289.2 


329.2 


304.2 


347.7 


vs. dTMP 










42.3 


vs. dAMP 




24 


16 


9 




vs. dCMP 






40 


15 


57.3 


vs. dGMP 








25 


17.3 



Table 2. Panel A . Masses of the four deoxynucleotide residues are 
shown across the top, and calculated molecular weight differences 
between each pair of nucleotide residues are shown in the table. Note that 
chemically modified nucleotides will generally have different masses than 
those shown above for the natural nucleotides. The mass difference 
between a particular modified nucleotide and the other nucleotides will 
vary depending on the modification. See description of specific 
nucleotide modifications and cleavage mechanisms for details of cleavage 
products. Panel B . The mass differences between the natural nucleotides 
and 2-chloroadenine are shown (far right column). The smallest mass 
difference is 17.3 Da instead of 9 Da as in panel A, providing 
advantageous discrimination of nucleotides using mass spectrometry. 
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TABLE 3a 



2mer 


mass 




3mer 


mass 




4mer 


mass 




5mer 


mass 


OC 


596 




CCC 


885 






11/4 


wwwCG 


1 463 


CT 


611 




CCT 


900 




wwwl 


1 1 OA 


UUUUI 


1 478 


AC 


620 




CCA 


909 






1 1 an 


wCCGA 


1 487 


TT 


626 




CTT 


915 




1 Of\A 
1 tU** 




CCCTT 


1 493 


AT 


635 




CTA 


924 


CCTA 




wwwIA 


4 C ft O 

, 1 502 


03 


636 




COG 


925 


CCOG 


1 O 1 A 


CC0CG 


1 503 


AA 


644 




TTT 


930 


CTTT 


i O 1 Q 


ww 1 1 1 


■1 COO 

1 5U8 


GT 


651 


CAA 


933 




1 C.C.C. 




/W*A A 
wwwAM 


1511 


AG 


660 




TTA 


939 


CTTA 


1 OO A 




wwl IA 


1 5 1 7 


GQ 


676 




CTG 


940 




CCTG 


1229 




CCCTG 


1 51 A 

w IO 




TAA 


948 




| | | I 


1 234 


1*1111 

Willi 


1 9CO 


CGA 


949 




CAAT 


1 237 




I 5cD 


TTG 


955 




CCAG 


I COO 


wwwwA 


1 527 


AAA 


957 




TTTA 


1243 


CTTTA 


1532 


TGA 


964 




CTTG 


1244 


CCTTG 


1533 


033 


965 


CAAA 


1246 


CCAAA 


1535 


AAG 


973 


TTAA 


1252 




1 1 1 1 1 


1538 


TOG 


980 


CTAG 


1253 




CTTAA 


1541 


GGA 


939 


CCGG 


1254 


CCTGA 


1542 ! 


G33 


1005 




TTTG 


1259 


OCCGG 


1543 






TAAA 


1281 


II 1 IA 


1 547 




CAAG 


1262 


CTTTG 


1 548 


Continued 


from riqht 




TTAG 


1268 


CAATA 


1 550 


5mer 
TTTQQ 


mass 

1588 








CTOG 


1269 




CCAGA 


1 551 


TAAAG 


1590 


4 1 






AAAA 
TAAG 


1277 


TTTA A 

wl lurt 


1 500 

1 CC7 


CAAGG 


1591 




CAGQ 


1278 


WW 1 WW 


1 ECP. 

lODO 


ATTGG 


1597 


TTGG 


1284 


PAAAA 


1CCQ 

i ooy 


CTOQG 


1598 


AAAG 


1286 


TTTTG 


1563 


AAAAG 


1599 


TAGG 


1293 


TTAAA 


1565 


TAAGG 


1606 




1294 




CTAGA 


1 "Sfifi 

IOOO 


ACGGG 


1607 


AAGG 


1302 




CCGGA 


1567 


TTGGG 


1613 


TOGG 


1309 




TTTGA 


1572 


AAAGG 


1615 


AGGG 


1318 




CTTGG 


1573 


ATGGG 


1622 


GGGG 


1334 


TAAAA 


1574 I 


OGGGG 


1623 






CAAAG 


1575 


AAGGG 


1631 




TTAAG 


1581 


TGGGG 


1638 


CTGGA 


1582 I 


AGGGG 
GG3GG 


1647 
1663 


AAAAA 


1583 








CCGGG 


1583 



Table 3. Masses of all possible compositions of 2mers, 3mers, 4mers and 5mers in order of mass 
in Daltons (Da), rounded to the nearest whole number for ease of presentation. (Other nucleotide 
orders are possible for many of the oligonucleotides.) The 5mers column is continued on the left 
under the 2mers. Note that two 5mers with different nucleotide content have the same mass 
(AAAAA and CCGGG, shaded at bottom right, both weigh 1504). The molecular masses are 
provided; ionization will change the masses. More generally, these masses are illustrative; actual 
masses will differ depending on the chemical modification, cleavage mechanism and polarity of 
ionization. 
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Thus, for a given target anaiyte polynucleotide, if its sequence is known, it is 
possible to determine whether cleavage at one or more of the base nucleotides 
would produce any of the above confounding artifacts and then, by judicious choice 
5 of experimental conditions, it is possible to avoid or resolve them. 

Based on the preceding analysis, it can be seen that any difference in the 
nucleotide sequence among two or more similar polynucleotides from different 
members of a population will result in a difference in the pattern of fragments 
obtained by cleavage of the polynucleotides and thus a difference in the masses 

10 seen in the mass spectrogram. Every variance will result in two mass changes, the 
disappearance of a mass and the appearance of a new mass. In addition, if a 
double-stranded polynucleotide is being analyzed or if two strands are being 
analyzed independently, the variance will result in a change in mass of the two 
complementary strands of a target DNA resulting in four mass changes altogether (a 

1 5 mass disappearance and a mass appearance in each strand). The presence of a 
second strand displaying mass changes provides a useful internal corroboration of 
the presence of a variance. In addition, the sets of mass changes in fragments from 
complementary strands can provide additional information regarding the nature of 
the variance. Figs. 27 - 30 exemplify the detection of a mass difference on both 

20 strands of a polynucleotide after full substitution and cleavage at modified dA, a 
variant position in the transferrin receptor gene. Table 5 shows the sets of mass 
changes expected on complementary strands for all possible point mutations 
(transitions and transversions). Once the mass spectrogram is obtained, it will be 
immediately apparent whether the variance was an addition of one or more 

25 nucleotides to a fragment (an approximately 300+ a.u. increase in fragment mass), 
deletion of one or more nucleotides from a fragment (approximately a 300+ a.u. 
decrease in fragment mass) or a substitution of one or more nucleotides for one or 
more other nucleotides (differences as shown in Table 5). Furthermore, if the 
variance is a substitution, the exact nature of that substitution can also be 

30 ascertained. 
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TABLE 3b 





Polynucleotides 


Masses 

III w ww 


del I 


A fC. CIA 


I DOO.UIO 




d (A*} 


1 566 068 


OBI £. 


a ^503; 






d ( T<A 


2433 AO 1 ? 






94^ fv*fi 


Set 3 


d CAiG-^ 


2617 707 

J— V' J. ft/ \J 1 




u VMS 1 1 / 


2617.711 


Set 4 


d(C 10 T,) 


3196.090 




d(G 10 ) 


3196.137 


Set 5 


d (C 6 T,A4) 


3292.134 




d(C 13 ) 


3292.190 


Set 6 


d(C 13 ) 


3759.457 




d (T 7 A,G 4 ) 


3759.472 


Set 7 


d (C 5 T 9 ) 


4183.751 




d (A 6 G 7 ) 


4183.779 


Set 8 


d(T 7 G 7 ) 


4433.899 




d(C,,A4) 


4433.936 
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TABLE 4 (parti) 

Cleavage at G 



2mer 


mass 


mass A 


CC 


517 




CT 


532 


15 


AC 


541 


9 


TT 


547 


6 


AT 


556 


9 


AA 


565 


9 




3mer 


mass 


mass A 


OCC 


806 




CCT 


821 


15 


CCA 


830 


9 


CTT 


836 


6 


CTA 


845 


9 


TTT 


851 


6 


CAA 


854 


3 


TTA 


860 


6 


TAA 


B69 


9 


AAA 


878 


9 




4mer 


mass 


mass A 


cccc 


1095 




CCCT 


1110 


15 


CCCA 


1119 


9 


CCTT 


1125 


6 


CCTA 


1134 


9 


CI II 


1 140 


6 


CCAA 


1 143 


3 


CTTA 


1 149 


6 


TTTT 


1155 


6 


CAAT 


1 158 


3 


TTTA 


1 164 


6 


CAAA 


1 167 


3 


TTAA 


1 173 


6 


TAAA 


1 182 


9 


AAAA 


1 191 


9 




5mer 
ccccc 


mass 


mass A 


1384 




CCCCT 


1399 


15 


CCCCA 


1408 


9 


CCCTT 


1414 


6 


CCCTA 


1423 


9 


CCTTT 


1429 


6 


CCCAA 


1432 


3 


CCTTA 


1438 


6 


CTTTT 


1444 


6 


CCTAA 


1447 


3 


CTTTA 


1453 


6 


CCAAA 


1456 


3 


TTTTT 


1459 


3 


CTTAA 


1462 


3 


TTTTA 


1468 


6 


CAATA 


1471 


3 j 


TTTAA 


1477 


6 


CAAAA 


1480 


3 


TTAAA 


1486 


6 


TAAAA 


1495 


9 


AAAAA 


1504 


9 



Cleavage at C 



2mer 


mass 


mass A 


TT 


547 




AT 


556 


9 


AA 


565 


9 


GT 


572 


7 


AG 


581 


9 


03 


597 


16 




3mer 


mass 


mass A 


TTT 


851 




TTA 


860 


9 


TAA 


869 


9 


TTG 


876 


7 


AAA 


878 


2 


TQA 


885 


7 


AAG 


894 


9 


TOG 


901 


7 


QGA 


910 


9 


033 


926 


16 




4mer 


mass 


mass A 


TTTT 


1155 




TTTA 


1164 


9 


TTAA 


1173 


9 


TTTG 


1180 


7 


TAAA 


1182 


2 


TTAG 


1189 


7 


AAAA 


1191 


2 


TAAG 


1 198 


7 


TTGG 


1205 


7 


AAAG 


1207 


2 


TAGG 


1214 


7 


AAGG 


1223 


9 


TGQG 


1230 




AGOG 


1239 


9 


GGGG 


1255 


16 




5mer 


mass 


mass A 




1459 




TTTTA 


1468 


9 


TTTAA 


1477 


9 


TTTTG 


1 484 


7 


TTAAA 


1486 


2 


TTTGA 


1493 


7 


TAAAA 


1495 


2 


TTAAG 


1502 


7 


AAAAA 


1504 


2 


TTTGG 


1509 


5 


TAAAG 


1511 


2 


ATTGG 


1518 


7 


AAAAG 


1520 


2 


TAAGG 


1527 


7 


TTGGG 


1534 


7 


AAAGG 


1536 


2 


ATGGG 


1543 


7 


AAGGG 


1552 


9 


TGGGG 


1559 


7 


AGGGG 


1568 


9 


GGGGG 


1584 


16 



Cleavage at A 



2mer 


mass 


mass A 


CC 


517 




CT 


532 


15 


TT 


547 


15 


03 


557 


10 


GT 


572 


15 


G3 


597 


25 




3mer 


mass 


mass A 


CCC 


806 




CCT 


821 


15 


CTT 


836 


15 


COG 


846 


10 


TTT 


851 


5 


CTG 


861 


10 


TTG 


876 


15 


033 


886 


10 


TGG 


901 


IS 


003 


926 


25 




4mer 


mass 


mass A 


CCCC 


1095 




CCCT 


1110 


15 


CCTT 


1125 


15 


CCCG 


1135 


10 


CTTT 


1140 


5 


CCTG 


1150 


10 


TTTT 


1155 


5 


CTTG 


1165 


10 


0093 


1175 


10 


TTTG 


1180 


5 


CTGG 


1190 


10 


TTGG 


1205 


15 


0333 


1215 


10 


TGQG 


1230 


15 


GQQ3 


1255 


25 




5mer 


mass 


mass A 


ccccc 


1384 




CCCCT 


1399 


15 


CCCTT 


1414 


15 


CCCCG 


1424 


10 


CCTTT 


1429 


5 


CCCTG 


1439 


10 


CTTTT 


1444 


5 


CCTTG 


1454 


10 


TTTTT 


1459 


5 


CCCGG 


1464 


5 


CTTTG 


1469 


5 


CCTGG 


1479 


10 


TTTTG 


1484 


5 


CTTGG 


1494 


10 


CCGGG 


1504 


10 


TTTGG 


1509 


5 


CTGGG 


1519 


10 


TTGGG 


1534 


15 


CGGGG 


1544 


10 


TGGGG 


1559 


15 


GGGGG 


1584 


25 



Cleavage at T 



2mer 


mass 


mass A 


CC 


517 




AC 


641 


24 


03 


557 


16 


AA 


565 


8 


AG 


581 


16 


G3 


597 


16 




3mer 


mass 


mass A 


CCC 


606 




CCA 


830 


24 


COG 


846 


16 


CAA 


854 


8 


CGA 


870 


16 


AAA 


878 


8 


OGG 


886 


8 


AAG 


894 


8 


GGA 


910 


16 


OGG 


926 


16 




4mer 


mass 


mass A 


CCCC 


1095 




CCCA 


1119 


24 


CCCG 


1135 


16 


CCAA 


1143 


8 


CCAG 


1159 


16 


CAAA 


1167 


8 


COGG 


1175 


8 


CAAG 


1183 


B 


AAAA 


1191 


8 


CAGG 


1199 


B 


AAAG 


1207 


B 


0933 


1215 


8 


AAGG 


1223 


8 


AGOG 


1239 


16 


GG33 


1255 


16 




5mer 


mass 


mass A 


CCCCC 


1364 




CCCCA 


1408 


24 


CCCCG 


1424 


16 


CCCAA 


1432 


8 


CCCGA 


1448 


16 


CCAAA 


1456 


8 


CCCGG 


1464 


8 


CCAGA 


1472 


8 


CAAAA 


1480 


8 


CCGGA 


1488 


8 


CAAAG 


1496 


8 


AAAAA 


1504 


8 


CCGGG 


1504 


0 


CAA GG 


1512 


8 


AAAAG 


1520 


8 


ACGGG 


1528 


8 


AAAGG 


1536 


8 


CGGGG 


1544 


8 


AAGGG 


1552 


8 


AGGGG 


1568 


16 


GGGGG 


1584 


16 



Table 4 (part 1 ol 2). Masses resulting from cleavage of oligonucleotides at specific nucleotides (G,CA or T, as indicated). 
Cleavage at G will produce fragments with no internal G residues; depending on the cleavage mechanism there may be a Gat the 5' 
or 3' end of the cleaved mass. In this table G has been omitted from tha G cleavage Iragments for ease of presentation (thus each 
fragment could be considered one nucleotide longer); note that addition of a Gto each of the G cleavage fragments would have no 
effect on the mass differences between fragments (mass a). Similar considerations obtain for C, A and T cleavage fragments. 
Two Smers with the same T cleavage mass are shaded. Masses were calculated by adding nucleotide masses rounded to the 
nearest whole number (and therefore not accurate, but the pattern of results Is unaffected); 6 1 Daltons, the mass of a phosphate 
group, was subtracted from all fragments since most cleavage mechanisms will result In removal of one phosphate group. 
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TABLE 4 (part 2) 



Cleavage at G 



6mar 


mass 


mass A 


CCCCCC 


1673 




CCCCCT 


1688 


15 


CCCCCA 


1697 


9 


CCCCTT 


1703 


6 


CCCCTA 


1712 


9 


CCCTTT 


1718 


6 


CCCCAA 


1721 


3 


CCCTTA 


1727 


6 


CCTTTT 


1733 


6 


CCCTAA 


1738 


3 


CCTTTA 


1742 


6 


CCCAAA 


1745 


3 


TTTTTC 


1748 


3 


CCTTAA 


1751 


3 


CTTTTA 


1757 


6 


CCAAAT 


1760 


3 


Tn 1 1 1 


1763 


3 


CTTTAA 


1766 


3 


CCAAAA 


1769 


3 


I I I I IA 


1772 


3 


CTTAAA 


1775 


3 


TTTTAA 


1731 


6 


TAAAAC 


1784 


3 


TTTAAA 


1790 


6 


CAAAAA 


1793 


3 


TTAAAA 


1799 


6 


TAAAAA 


1808 


9 


AAAAAA 


1817 


9 




I 7mer I mass I mass A 






CCCCCCT 


1977 


15 


CCCCCCA 


1988 


9 


CCCCCTT 


1992 


6 


CCCCCTA 


2001 


9 


CCCCTTT 


2007 


6 


CCCCCAA 


2010 


3 


CCCCTTA 


2016 


6 


CCCTTTT 


2022 


6 


CCCCTAA 


2025 


3 


CCCTTTA 


2031 


6 


CCCCAAA 


2034 


3 


CCI 1 1 1 1 


2037 


3 


CCCTTAA 


2040 


3 


UUIIIA 


2046 


6 


CCCAAAT 


2049 


3 


CTTTTTT 


2052 


3 


CCTTTAA 


2055 


3 


CCCAAAA 


2058 


3 


TTTTTCA 


2061 


3 


CCTTAAA 


2064 


3 


mini 


2087 


3 


TTTTAAC 


2070 


3 


TAAAACC 


2073 


3 


Alllll 1 


2076 


3 


TTTAAAC 


2079 


3 


CCAAAAA 


2082 


3 


AATTTTT 


2085 


3 


CTTAAAA 


2088 


3 


AAATTTT 


2094 


6 


CTAAAAA 


2097 


3 


AAAATTT 


2103 


6 


CAAAAAA 


2106 


3 


AAAAATT 


2112 


6 


AAAAAAT- 


2121 


9 


AAAAAAA 


2130 


9 



Cleavage at C 



6mer 


mass 


mass A 


llllll 


1763 




I I I I IA 


1772 


9 


TTTTAA 


1781 


9 


TTTTTG 


1788 


7 


TTTAAA 


1790 


2 


TTTTAG 


1797 


7 


TTAAAA 


1799 


2 


TTTAAG 


1806 


7 


TAAAAA 


1808 


2 


TTTTGG 


1813 


5 


TTAAAG 


1815 


2 


AAAAAA 


1817 


2 


TTTGGA 


1822 


5 


AAAAGT 


1824 


2 


TTAAGG 


1831 


7 


AAAAAG 


1833 


2 


TTTGGG 


1838 


5 


AAAGGT 


1840 


2 


ATTGGG 


1847 


7 


AAAAGG 


1849 


2 


TAAGGG 


1856 


7 


TTGGGG 


1863 


7 


AAAGGG 


1865 


2 


AGGGGT 


1872 


7 


AAGGGG 


1881 


9 


GGGGGT 


1888 


7 


AGGGGG 


1897 


9 


GGGGGG 


1913 


16 




7mer 


mass . 


mass A 


1 1 ITMI 


2067 




IIIIIIA 


2076 


9 


TTTTTAA 


2085 


9 


I I I I I IG 


2092 


7 


TTTTAAA 


2094 


2 


TTTTTGA 


2101 


7 


TTTAAAA 


2103 


2 


TTTTAAG 


2110 


7 


TTAAAA A 


2112 


2 


(jUI I 1 1 I 


21 17 


5 


TTTAAAG 


2119 


2 


TAAAAAA 


2121 


2 


TTTTGGA 


2126 


5 


TTAAAGA 


2128 


2 


AAAAAAA 


2130 


2 


TTTGGAA 


2135 


5 


AAAAAGT 


2137 


2 


GGGTTTT 


2142 


5 


TTAAAGG 


2144 


2 


AAAAAAG 


2146 


2 


TTTGGGA 


2151 


5 


AAAAGGT 


2153 


2 


AATTGGG 


2160 


7 


AAAAAGG 


2162 


2 


GGGGTTT 


2187 


5 


TAAAGGG 


2169 


2 


TTGGGGA 


2176 


7 


AAAAGGG 


2178 


2 


AAGGGGT 


2185 


7 


GGGGGTT 


2192 


7 


AAAGGGG 


2194 


2 


AGGGGGT 


2201 


7 


AAGGGGG 


2210 


9 


GGGGGGT 


2217 


7 


AGGGGGG 


2226 


9 


GGGGGGG 


2242 


16 



Cleavage at A 



6mer 


mass 


mass A 


CCCCCC 


1673 




CCCCCT 


1688 


15 


CCCCTT 


1703 


16 


CCCCCG 


1713 


10 


CCCTTT 


1718 


5 


CCCCTG 


1728 


10 


CCTTTT 


1733 


5 


CCCTTG 


1743 


10 


TTTTTC 


1748 


5 


CCCCGG 


1753 


5 


CCTTTG 


1758 


5 


llllll 


1763 


5 


CCCTGG 


1768 


5 


TTTTCG 


1773 


5 


CCTTGG 


1783 


10 


TTTTTG 


1788 


5 


CCCGGG 


1793 


5 


TTTCGG 


1798 


5 


CCTGGG 


1808 


10 


TTTTGG 


1813 


5 


TTCGGG 


1823 


10 


CCGGGG 


1833 


10 


TTTGGG 


1838 


5 


TGGGGC 


1848 


10 


TTGGGG 


1883 


15 


GGGGGC 


1873 


10 


GGGGGT 


1888 


15 


GGGGGG 


1913 


25 




7mer 


mass 


mass A 


CCCCCCC 


1962 




CCCCCCT 


1977 


15 


CCCCCTT 


1992 


15 


CCCCCCG 


2002 


10 


CCCCTTT 


2007 


5 


CCCCCTG 


2017 


10 


CCCTTTT 


2022 


5 


CCCCTTG 


2032 


10 


CCTTTTT 


2037 


5 


CCCCCGG 


2042 


5 


CCCTTTG 


2047 


5 


CI I I III 


2052 


5 


CCCCTGG 


2057 


5 


CCTTTTG 


2062 


5 


mini 


2067 


5 


CCCTTGG 


2072 


5 


CI 1 1 1 IG 


2077 


5 


CCCCGGG 


2082 


5 


CTTTCGG 


2087 


5 


C3 1 I II II 


2092 


5 


CCCTGGG 


2097 


5 


CTTTTGG 


2102 


5 


CCTTGGG 


2112 


10 


GGTTTTT 


2117 


5 


CCCGGGG 


2122 


5 


CTTTGGG 


2127 


5 


TGGGGCC 


2137 


10 


GGGTTTT 


2142 


5 


CTTGGGG 


2152 


10 


GGGGGCC 


2162 


10 


GGGGTTT 


2167 


5 


GGGGGTC 


2177 


10 


GGGGGTT 


2192 


15 


CGGGGGG 


2202 


10 


GGGGGGT 


2217 


15 


GGGGGGG 


2242 


25 



Cleavage at T 



6mer 


mass 


mass A 


CCCCCC 


1673 




CCCCCA 


1697 


24 


CCCCCG 


1713 


16 


CCCCAA 


1721 


8 


CCCCAG 


1737 


16 


CCCAAA 


1745 


8 


CCCCGG 


1753 


8 


CCCAAG 


1761 


8 


CCAAAA 


1769 


8 


CCCGGA 


1777 


8 


CCAAAG 


1785 


8 


CCCGGG 


1793 


8 


CAAAAA 


1793 


0 


CCAAGG 


1801 


8 


CAAAAG 


1809 


8 


CCGGGA 


1817 


8 


AAAAAA 


1817 


0 


AAACGG 


1825 


8 


AAAAAG 


1833 


8 


CCGGGG 


1833 


0 


AACGGG 


1841 


8 


AAAAGG 


1849 


8 


ACGGGG 


1857 


8 


AAAGGG 


1865 


8 


GGGGGC 


1873 


8 


AAGGGG 


1881 


8 


AGGGGG 


1897 


16 


GGGGGG 


1913 


16 




7mer 


mass 


mass A 


CCCCCCC 


1962 




CCCCCCA 


1986 


24 


CCCCCCG 


2002 


16 


CCCCCAA 


2010 




CCCCCGA 


2026 


16 


CCCCAAA 


2034 


8 


CCCCCGG 


2042 


8 


CCCCAAG 


2050 


8 


CCCAAAA 


2058 


8 


CCCCGGA 


2066 


8 


CCCAAAG 


2074 


8 


CCAAAAA 


2082 


8 


CCCCGGG 


2082 


0 


CCCGGAA 


2090 


8 


CCAAAAG 


2098 


8 


CCCGGGA 


2106 


8 


CAAAAAA 


2106 


0 


CCAAAGG 


2114 


8 


CAAAAAG 


2122 


6 


CCCGGGG 


2122 


0 


CCGGGAA 


2130 


8 


AAAAAAA 


2130 


0 


AAAACGG 


2138 


8 


AAAAAAG 


2146 


8 


CCGGGGA 


2146 


0 


AAACGGG 


2154 


8 


AAAAAGG 


2162 


8 


CCGGGGG 


2162 


0 


AACGGGG 


2170 


8 


AAAAGGG 


2178 


8 


AGGGGGC 


2186 


8 


AAAGGGG 


2194 


8 


CGGGGGG 


2202 


8 


AAGGGGG 


2210 


8 


AGGGGGG 


2226 


16 


GGGGGGG 


2242 


16 



Table 4 (part 2). Masses resulting from cleavage of oligonucleotides at specific nucleotides (G,C,A or T, as indicated). See legend to part 1 of 
this Table. Note that the two Smers with the same T cleavage mass (part 1) continue to propagate through the T cleavage masses (shaded) 
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E. Serial Cleavage 

The preceding discussion focuses primarily on the use of one cleavage 
reaction with any given modified polynucleotide. However, it is also possible and it 
is a further aspect of this invention, to serially cleave a polynucleotide in which two 

5 or more natural nucleotides have been replaced with two or more modified 

nucleotides which have different cleavage characteristics. That is, a polynucleotide 
that contains two or more types of modified nucleotides, either fully or partially 
substituted, can be cleaved by serial exposure to different cleavage conditions, 
either chemical, physical or both. One preferred embodiment of this approach is 

10 tandem mass spectrometry, where fragmented molecular species produced by one 
procedure can be retained in a suitable mass spectrometer (e.g.. Fourier-transform 
ion cyclotron resonance mass spectrometer or ion trap mass spectrometer), for 
subsequent exposure to a second physical/chemical procedure that results in 
activation and cleavage at a second modified nucleotide. The product ions may be 

15 subjected to a third and even a fourth cleavage condition directed to specific 

modifications on a third and fourth nucleotide to enable observation of precursor- 
product relationships between the input (precursor) ions and those generated during 
each round of cleavage. The use of a continuous or stepwise gradient of cleavage 
conditions of increasing efficiency may be used to enhance the elucidation of 

20 precursor-product relationships between ions. 

The production of a polynucleotide containing multiple modified nucleotides 
reduces the need to perform multiple polymerizations on the same template to 
produce a set of polynucleotides each with a different single modified nucleotide; i.e, 
one for cleavage at A, one for G, one for T and one for C. Also, the serial 

25 application of cleavage procedures specific for different nucleotides of a single 
polynucleotide enhances detection of precursor-product relationships, which is 
useful for 

determining DNA sequence. Figure 21 shows the production of a polynucleotide 
modified by complete substitution of riboGTP for dGTP and 5'-amino-TTP for dTTP 
30 followed by cleavage with base, which results in cleavage at G, or cleavage with 
acid, which results in cleavage at T. Subsequent treatment of the base cleaved 
fragments with acid or visa-versa results in further fragmentation into double (G and 
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T) cleaved fragments. This would be useful, for example and without limitation, for 
identifying a variance at position 27 (dA) of the sequence (Fig. 21). That is, as can 
be seen in Fig. 21 , cleavage at G alone produces the fragment ACTTCACCG 
(position 27 is highlighted), which contains two dA residues. A change in mass of 
5 this fragment of -24 Da, indicating an A to C change, would not permit determination 
of which of the two dA residues changed to dC. Similarly, cleavage at T alone to 
give the fragment TCACCGGCACCA, which contains three dA residues also 
prevents determination of which dA was changed. However, double cleavage at G 
and T produces the fragment TCACCG which undergoes the -24 Da mass shift and, 

10 because it only contains one dA, allows definitive assignment of the variant 
nucleotide. Schemes using this approach to precisely detect variances at other 
nucleotides will be apparent to those skilled in the art based on the disclosures 
herein and are within the scope of this invention. 

A further aspect of this invention is a algorithm or algorithms which permit the 

1 5 use of computers to directly infer DNA sequence or the presence of variances from 
mass spectrometry. 
F. Parallel Cleavage 

It is likewise possible, and it is a further aspect of this invention that a 
polynucleotide which has been substituted with two or more modified nucleotides 

20 each if which is susceptible to a different cleavage procedure, may be analyzed in 
parallel fashion. That is, one can divide the polynucleotide into aiiquots and expose 
each 

aliquot to a cleavage procedure specific for one of the modified nucleotides. This 
saves the effort of performing independent polymerization reactions for each of the 
25 modified nucleotides. This approach can be used to generate sequence ladders, or 
to generate 

complete cleavage products for variance detection. As reviewed in Example 5, 
complete cleavage at two different nucleotides (performed independently), followed 
by mass spectrometry, substantially increases the efficiency of variance detection 
30 compared to cleavage at a single nucleotide. 

For example, consider a single polynucleotide substituted with ribo-A, 5'- 
amino-C, and 5'-(bridging) thio-G nucleotides. All three modified nucleotides are 
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known to be incorporated by polymerases. Sequence ladders can be produced from 
such a modified polynucleotide by exposure of one aliquot to acid, resulting in 
cleavage at C; exposure of a second aliquot to base, resulting in cleavage at A; and 
exposure of a third aliquot to silver or mercury salts, resulting in cleavage at G. It is 
5 possible that a polynucleotide produced with the three above modified nucleotides 
plus 4'-C-acyl T could also (separately) be exposed to UV light to produce cleavage 
at T, resulting in a complete set of sequencing reactions from a single polymerization 
product. 

G. Combination of modified nucleotide cleavage and chain termination 

10 Another application of modified nucleotide incorporation and cleavage is to 

combine it with a chain termination procedure. By incorporating one or more 
modified nucleotides in a polymerization procedure (for example but without 
limitation, modified A) with a different chain terminating nucleotide, such as a 
dideoxy-G, a Sanger-type ladder of fragments terminating at the dideoxy-nucleotide 

15 can be generated. Subsequent exposure of this ladder of fragments to a chemical 
that cleaves at the modified A will result in further fragmentation, with the resulting 
fragments terminating 5' to A and 3' to either A (most of the time) or G (in one 
fragment per chain termination product). Comparison of the resulting fragment set 
with a fragment set produced solely by substitution and cleavage at the modified 

20 nucleotide (A) will provide an instructive comparison: all the fragments will be the 
same except for the presence of extra fragments in the chain terminating set which 
end at 3' G, which, on mass spectrometric analysis would provide the mass (and by 
inference the nucleotide content) of all fragments in which an A is followed (directly 
or after some interval) by a G, without an intervening A. Derivation of similar data 

25 using other chain terminating nucleotides and other cleavage nucleotides will 
cumulatively provide a set of data useful for determining the sequence of the 
polymerization products. 

H. Cleavage resistant modified nucleotide substitution and mass shifting 
nucleotides 

30 The preceding embodiments of this invention relate primarily to the 

substitution into a polynucleotide of one or more modified nucleotides which have 
the effect of enhancing the susceptibility of the polynucleotide to cleavage at the 
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site(s) of incorporation of the modified nucleotide(s) in comparison to unmodified 
nucleotides. It is entirely possible, however, and it is yet another aspect of this 
invention, that a modified nucleotide which, when incorporated into a polynucleotide, 
reduces susceptibility to cleavage at the site of incorporation of the modified 
5 nucleotide compared to unmodified sites. In this scenario, cleavage would then 
occur at unmodified sites in the polynucleotide. Alternatively, a combination of 
cleavage-resistant and cleavage-sensitive modified nucleotides may be incorporated 
into the same polynucleotide to optimize the differential between cleavable and non- 
cleavable sites. 

10 An example of a modified nucleotide which imparts this type of resistance to 

cleavage is the 2-fluoro derivative of any natural nucleotide. The 2-fluoro 
derivative has been shown to be substantially less susceptible to fragmentation in a 
mass spectrometer than unsubstituted natural nucleotides. 

As shown in Table 2, the mass differences between the naturally occurring 

15 nucleotides range from 9 to 40 Da and are sufficient for resolving single nucleotide 
differences in all fragments of 25mer size and under. However, it may be desirable 
to increase the mass difference between the four nucleotides or between any pair of 
nucleotides to simplify their detection by mass spectrometry. This is illustrated for 
dA and its 2-chloroadenine analog in Table 2. That is, substitution with 2- 

20 chloroadenine, mass 347.7, increases the A-T mass difference from 9 Da to 42.3 
Da, the A-C difference from 24 to 57.3 Da and the A-G difference from 16 to 17.3 
Da. Other mass-shifting nucleotide analogs are known in the art and it is an aspect 
of this invention that they may be used to advantage with the mass spectrometric 
methods of this invention. 

25 I. Applications 

A number of applications of the methods of the present invention are 
described below. It is understood that these descriptions are exemplary only and 
are not intended to be nor are they to be construed as being limiting on the scope of 
this invention in any manner whatsoever. Thus, other applications of the methods 

30 described herein will become apparent to those skilled in the art based on the 
disclosures herein; such applications are within the scope of this invention, 
a. Full substitution, full extension and complete cleavage. 
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In one aspect of the present invention at least one of the four nucleotides of 
which the target polynucleotide is composed is completely replaced with a modified 
polynucleotide (either on one strand using primer extension, or on both strands 
using a DNA amplification procedure), a full length polynucleotide is made and 
5 substantially complete cleavage is effected. The result will be cleavage of modified 
polynucleotides into fragments averaging four nucleotides in length. This is so 
because the abundance of A, T, G and C nucleotides is roughly equal in most 
genomes and their distribution is semi-random. Therefore a particular nucleotide 
occurs approximately once every four nucleotides in a natural polynucleotide 

10 sequence. There will, of course, be a distribution of sizes, with considerable 

deviation from the average size due to the non-random nature of the sequence of 
biological polynucleotides, and the unequal amounts of A:T vs. G:C base pairs in 
different genomes. The extended primer (whether primer extension or amplification) 
will not be cleaved until the first occurrence of a modified nucleotide after the end of 

1 5 the primer, resulting in fragments of greater than 1 5 nt (i.e., greater than the length 
of the primer). Often, these primer-containing fragments will be the largest or among 
the largest produced. This can be advantageous in the design of genotyping assays. 
That is, primers can be designed so that the first occurrence of a polymorphic 
nucleotide position is after the primer. After cleavage, the genotype can be 

20 determined from the length of the primer-containing fragment. This is illustrated in 
Figs. 27 - 32. Due to this variation in the size of analyte masses it is essential that 
the mass spectrometer be capable of detecting polynucleotides ranging up to 
20mers, or even 30mers, with a level of resolution and mass accuracy consistent 
with unambiguous determination of the nucleotide content of each mass. As 

25 discussed below, this requirement has different implications depending on whether 
the nucleotide sequence of the analyte polynucleotide is already known (as will 
generally be the case with variance detection or genotyping) or not (as will be the 
case with de novo DNA sequencing). 

i. Applications to variance detection 

30 Variance detection is usually performed on an analyte DNA or cDNA 

sequence for which at least one reference sequence is available. The concern of 
variance detection is to examine a set of corresponding sequences from different 
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individuals (sample sequences) in order to identify sequence differences between 
the reference and sample sequences or among the sample sequences. Such 
sequence variances will be identified and characterized by the existence of different 
masses among the cleaved sample polynucleotides. 
5 Depending on the scope of the variance detection procedure, analyte 

fragments of different lengths may be optimal. For genotyping, it is desirable that 
one primer be close to the know variant site. 

Generally an analyte fragment of at least 50 nucleotides, more preferably at 
least 100 nucleotides and still more preferably at least 200 nucleotides will be 

10 produced by polymerase incorporation of modified nucleotides (either A, G, C or T), 
followed by cleavage at the sites of modified nucleotide incorporation, and mass 
spectrometric analysis of the resulting products. Given the frequency of nucleotide 
variances (estimated at one in 200 to one in 1000 nucleotides in the human 
genome), there will generally be zero or only one or two cleavage fragments that 

1 5 differ among any two samples. The fragments that differ among the samples may 
range in size from a monomer to a 10mer, less frequently up to a 20mer or, rarely, a 
fragment of even greater length; however, as noted above, the average cleavage 
fragment will be about 4 nucleotides. Knowledge of the reference sequence can be 
used to avoid cleavage schemes that would generate very large cleavage products, 

20 and more generally to enhance the detectability of any sequence variation that may 
exist among the samples by computing the efficiency of variance detection at each 
nucleotide position for all possible cleavage schemes, as outlined below. However, 
large sequences are not really a problem when a reference sequence is available 
and the analyte fragment length is only several hundred nucleotides. This is 

25 because it is extremely unlikely that any analyte fragment will contain two large 
cleavage masses that are close in size. In general, if there are only a few large 
fragments they can be easily identified and, as Table 5 shows, even with a MALDI 
instrument capable of mass resolution of only 1000, the most difficult substitution, an 
A <-> T change resulting in a 9 amu shift can be detected in a 27mer. 
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TABLE 5 







Resolving Power of MS Instrument (FWHM) 


X > UV 1C u uuc 

substitution 


A (Da) 


1,000 1,500 


2,000 10,000 


Maximum fragment in w 


lich A at left is resolvable 


C<->G 


40 


123 nt 


184 nt 


246 nt 


1,230 


G<->T 


25 


77 nt 


116nt 


154 nt 


770 


A <-> C 


24 


74 nt 


111 nt 


148 nt 


740 


A<->G 


16 


49 nt 


74 nt 


98 nt 


490 


C<->T 


15 


46 nt 


69 nt 


92 nt 


460 


A<->T 


9 


27 nt 


41 nt 


55 nt 


270 



Table 5. This table summarizes the relation between mass spectrometer resolution 
and nucleotide changes in determining the maximum size fragment in which a 
given base change can be identified. The maximum size DNA fragment (in 
nucleotides; nt) in which a base substitution can theoretically be resolved is 
provided in the four columns at right (bottom 6 rows) for each possible nucleotide 
substitution, listed in column at left. As is evident from the table, the mass 
difference created by each substitution (A, measured in Daltons) and the resolving 
power of the mass spectrometer determine the size limit of fragments that can be 
successfully analyzed. Commercially available MALDI instruments can resolve 
between 1 part in 1,000 to 1 part in 5,000 (FWHM) while available ESI 
instruments can resolve 1 part in 10,000. Modified ESI MS instruments are 
capable of at least 10-fold greater mass resolution. (The theoretical resolution 
numbers in the table do not take into consideration limitations on actual resolution 
imposed by the isotopic heterogeneity of molecular species and the technical 
difficulty of efficiently obtaining large ions.) FWHM: full width at half maximal 
height, is a standard measure of mass resolution. (For further information on 
resolution and mass accuracy in MS see, for example: Siuzdak, G. Mass 
Spectrometry for Biotechnology. Academic Press, San Diego, 1996.) 
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In order to select experimental conditions for variance detection that maximize the 
likelihood of success, one can use the reference sequence to predict the fragments 
that would be produced by cleavage at A, G, C orT in advance of experimental 
work. Based on such an analysis, the optimal modified nucleotide substitution and 
cleavage scheme can be selected for each DNA or cDNA sequence that is to be 
analyzed. Such an analysis can be performed as follows: 

• For each nucleotide of the test polynucleotide, substitute each of the three 
other possible nucleotides and generate an associated mass change. For 
example, if at position 1 the test polynucleotide begins with A, then generate 
hypothetical polynucleotides beginning with T, G and C. Next move to 
position two of the test sequence and again make all three possible 
substitutions, and so forth for all positions of the test polynucleotide. If the 
test polynucleotide is 100 nucleotides in length then altogether 300 new 
hypothetical fragments will be generated by this procedure on one strand and 
another 300 on the complementary strand. Each set of three substitutions 
can then be analyzed together. 

• Generate the masses that would be produced by cleaving at T, C, G or A 
each of the three new hypothetical test fragments obtained by the 
substitutions of T, C or G for A at position 1 . Compare these mass sets with 
the set of masses obtained from the reference sequence (which in our 
example has A at position 1). For each of the four cleavages (T, C, G, A), 
determine whether the disappearance of an existing mass or the generation 
of a new mass would create a difference in the total set of masses. If a 
difference is created, determine whether it is a single difference or two 
differences (i.e. a disappearance of one mass and an appearance of 
another). Also determine the magnitude of the mass difference compared to 
the set of masses generated by cleavage of the reference sequence. 
Perform this same analysis for each of the 100 positions of the test sequence, 
in each case examining the consequences of each of the four possible base- 
specific cleavages, i.e., for DNA, at A, C, G and T. 

• Generate a correlation score for each of the four possible base-specific 
cleavages. The correlation score increases in proportion to the fraction of the 
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300 possible deviations from the reference sequence that produce one or 
more mass changes (i.e., a higher correlation score for two mass differences), 
and in proportion to the extent of the mass differences (greater mass 
differences score higher than small ones). 
5 • In the case of primer extension, the analysis is performed for one strand; in 
the case of amplfication, the computation is carried out on the products of 
cleavage of both strands. 

The above method can be extended to the use of combinations of 
substitution and cleavage. For example, T cleavage on each of the strands of the 

10 analyte polynucleotide (either independent or simultaneous cleavage of both strands 
at T), or cleavage at T and A on one strand (again, either independent or 
simultaneous cleavage of both strands), or cleavage of one strand with T and 
cleavage of the complementary strand with A, and so forth. Based on the generated 
correlation scores for each of the different schemes, an optimal scheme can be 

1 5 determined in advance of experimental work. 

A computer program can be constructed to accomplish the above task. Such 
a program can also be extended to encompass the analysis of experimental 
cleavage masses. That is, the program can be constructed to compare all the 
masses in the experimentally determined mass spectrum with the cleavage masses 

20 expected from cleavage of the reference sequence and to flag any new or missing 
masses. If there are new or missing masses, the experimental set of masses can be 
compared with the masses generated in the computational analysis of all the 
possible nucleotide substitutions, insertions or deletions associated with the 
experimental cleavage conditions. However, nucleotide substitutions are about ten 

25 times more common than insertions or deletions, so an analysis of substitutions 
alone should be useful. In one embodiment, the computational analysis data for all 
possible nucleotide insertions, deletions and substitutions can be stored in a look-up 
table. The set of computational masses that matches the experimental data then 
provides the sequence of the new variant sequence or, at a minimum, the restricted 

30 set of possible sequences of the new variant sequence. (The location and chemical 
nature of a substitution may not be uniquely specified by one cleavage experiment.) 
To resolve all ambiguity concerning the nucleotide sequence of a variant sample 



WO 00/18967 



152 



PCT/US99/22988 



may require, in some cases, another substitution and cleavage experiment (see 
Section E, Serial Cleavage and DNA sequencing applications described below), or 
may be resolved by some other sequencing method (e.g. conventional sequencing 
methods or sequencing by hybridization). It may be advantageous to routinely 
5 perform multiple different substitution and cleavage experiments on all samples to 
maximize the fraction of variances which can be precisely assigned to a specific 
nucleotide. 

The inventors have performed a computational analysis of natural 
polynucleotides of 50, 100, 150, 200 and 250 nucleotides and discovered that 

10 combinations of two nucleotide cleavages (for example cleave at A on one strand 
and G on the complementary strand) result in 99-100% detection efficiency, 
considering all possible substitutions up to 250 nt. Potentially useful but sometimes 
less than 100% sensitive analyses can be performed on longer fragments up to 
1000 nt. See Example 5 for details of this analysis. 

15 ii. Applications to DNA sequencing 

A still further aspect of this invention utilizes the chemical methods 
disclosed herein together with mass spectrometry to determine the complete 
nucleotide sequence of a polynucleotide de novo. The procedure involves the same 
reactions described above for variance detection; i.e., total replacement of one of 

20 the four nucleotides in a polynucleotide with a modified nucleotide followed by 
substantially complete cleavage of the modified polynucleotide at each and every 
point of occurrence of the modified nucleotide and then determination of the masses 
of the fragments obtained. In this case, however, it may be necessary to routinely 
perform four sets of cleavage reactions, a different natural nucleotide being replaced 

25 with a modified nucleotide in each reaction so that all four natural nucleotides are in 
turn replaced with modified nucleotides and the resultant modified polynucleotides 
are cleaved and the masses of the cleavage products determined. It may also be 
necessary to employ one or more multiple nucleotide substitutions, as discussed 
above, to resolve sequencing ambiguities that may arise. While the number of 

30 reactions necessary per sequence determination experiment is thus similar to that 
required for Maxam-Gilbert or Sanger sequencing, the method of this invention has 
the advantages of eliminating radiolabels or dyes, providing superior speed and 
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accuracy, permitting automation and eliminating artifacts, including compressions, 
associated with Maxam-Gilbert and Sanger sequencing or any other gel-based 
methods. This latter consideration may be of preeminent importance as mass 
spectrometry will currently allow analysis of cleavage reactions in a matter of 
5 seconds to minutes (and, in the future, milliseconds), compared to hours for current 
gel electrophoretic procedures. Furthermore, the inherent accuracy of mass 
spectrometry, together with the control over the construction of the modified 
polynucleotide that can be achieved using the methods of this invention will sharply 
reduce the need for sequencing redundancy. A representative total sequencing 
10 experiment is set forth in the Examples section, below. 

The process of inferring DNA sequence from the pattern of masses obtained 
by cleavage of analyte molecules is considerably more complicated than the process 
for detecting and inferring the chemical nature of sequence variances. In the case 
of sequencing by complete cleavage and mass analysis the following must be 
15 accomplished: 

• Determine the length of the sequence. From the experimentally determined 
masses infer the nucleotide content of each cleavage fragment as discussed 
elsewherein herein. This analysis is performed for each of the four sets of 
experimental cleavage masses. The shortcomings of this analysis are that 
20 two or more fragments (particularly short ones) may have identical mass, and 

therefore may be counted as one, leading to an undercounting of the length 
of the sequence. However, this is not a serious experimental problem in that 
the fragment masses can be summed and compared for all four cleavages; if 
they do not correspond then there must be two or more overlapping masses 
25 among the fragments. Thus, the determination of all fragment masses in all 

four cleavage reactions essentially eliminates this source of potential error. 
First, the set of cleavage masses that gives the greatest length can be taken 
as a starting point. Next, the nucleotide content of all of the masses in the 
other three cleavage reactions can be tested for whether they are compatible 
30 with the nucleotide content of any of the masses associated with the greatest 

length cleavage set. If they are not compatible, then there must be 
undercounting even in the set associated with the greatest length. 
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Comparison of sequence contents will generally allow the uncounted bases to 
be identified and the full length of the sequence to thus be determined. 
• The next aspect of the analysis may include: (a) determining the intervals at 
which A, C, G and T nucleotides must occur based on the sizes of respective 



is to reduce the extent of nucleotide substitution or the completeness of cleavage 
(see below) in order to obtain sets of incompletely (but still substantially) cleaved 
fragments. The mass analysis of such fragments may be extremely useful, in 
conjunction with the completely cleaved fragment sets, for identifying which 

20 fragments are adjacent to each other. A limited amount of such information is 

needed to complete the entire puzzle of assembling the cleavage fragments into a 
continuous sequence. 

Three additional ways to augment the inference of DNA sequence from 
analysis of complete substitution and cleavage masses are: (a) analysis of 

25 dinucleotide cleavage masses (see below), which can provide a framework for 

compartmentalizing the small masses associated with mononucleotide substitution 
and cleavage into fewer intermediate size collections. Dinucleotide cleavage also 
provides the location of dinucleotides sequences at intervals along the entire 
sequence in fact, dinucleotide cleavage at all possible dinucleotides is an alternate 

30 DNA sequencing method; (b) mononucleotide substitution and cleavage of the 
complementary strand using one or more modified nucleotides which can provide 
valuable complementary information on fragment length and overlaps; (c) 



5 



cleavage products; (b) analyze the nucleotide content of the largest fragments 
from each cleavage set to identify sets of nucleotides that belong together; (c) 
compare nucleotide content of fragments between the different sets to 
determine which fragments are compatible (i.e. one could be subsumed 
within the other or they could overlap) or incompatible (no nucleotides in 
common); (d) begin to integrate the results of these different analyses to 
restrict the number of ways in which fragments can be pieced together. The 
elimination of possibilities is as useful as the identification of possible 
relationships. A detailed illustration of the logic required to work out the 
sequence of a short oligonucleotide is provided in Example 4. 
One way to provide additional information about local sequence relationships 
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combination substitution and cleavage schemes employing simultaneous di- and 
mononucleotide cleavages or two different simultaneous mononucleotide cleavages 
can provide unambiguous information on sequence order. 

In the foregoing descriptions, it has been assumed that the modified 

5 nucleotide is selectively more susceptible to chemical cleavage under appropriate 
conditions than the three unmodified nucleotides. However, an alternative approach 
to effecting mononucleotide cleavage is to use three modified nucleotides that are 
resistant to cleavage under chemical or physical conditions sufficient to induce 
cleavage at an unmodified, natural nucleotide. Thus, in another aspect of the 

10 present invention, mononucleotide cleavage may be effected by selective cleavage 
at an unmodified nucleotide. One chemical modification of nucleotides which has 
been shown to make them more stable to fragmentation during mass spectrometric 
analysis is the 2'-fluoro modification. (Ono, T., et al., Nucleic Acids Research . 1997, 
25: 4581-4588.) The utility of 2'-fluoro substituted DNA for extending the accessible 

15 mass range for Sanger sequencing reactions (which is generally limited by 

fragmentation) has been recognized, but it is an aspect of the present invention that 
this chemistry also has utility in effecting nucleotide specific cleavage by fully 
substituting three modified nucleotides that are resistant to a specific physical or 
chemical cleavage procedure. Another chemical modification that has been shown 

20 to increase the stability of nucleotides during MALDI-MS is the 7-deaza analog of 
adenine and guanine. (Schneider, K. and Chait, B. T., Nucleic Acids Research . 
1995,23:1570-1575.) 

In another aspect of this invention, cleavage-resistant modified nucleotides 
may be used in conjunction with cleavage-sensitive modified nucleotides to effect a 

25 heightened degree of selectivity in the cleavage step. 

iii. Applications to genotyping 

As DNA sequence data accumulates from various species there is 
increasing demand for accurate, high throughput, automatable and inexpensive 
methods for determining the status of a specific nucleotide or nucleotides in a 
30 biological sample, where variation at a specific nucleotide (either polymorphism or 
mutation) has previously been discovered. This procedure - the determination of the 
nucleotide at a particular location in a DNA sequence - is referred to as genotyping. 



WO 00/18967 



156 



PCT/US99/22988 



Genotyping is in many respects a special case of DNA sequencing (or variance 
detection where only one position is being queried), but the sequence of only one 
nucleotide position is determined. Because only one nucleotide position must be 
assayed, genotyping methods do not entirely overlap with DNA sequencing 
5 methods. The methods of this invention provide the basis for novel and useful 
genotyping procedures. The basis of these methods is polymerization of a 
polynucleotide spanning the polymorphic site. The polymerization may be either by 
the PCR method or by primer extension, but is preferably by PCR. The 
polymerization is performed in the presence of three natural nucleotides and one 

10 chemically modified nucleotide, such that the chemically modified nucleotide 
corresponds to one of the nucleotides at the polymorphic or mutant site. For 
example if an A/T polymorphism is to be genotyped the cleavable nucleotide could 
be either A or T. If a G/A polymorphism is to be genotyped the cleavable nucleotide 
could be either A or G. Conversely the assay could be set up for the complementary 

15 strand, where T and C occur opposite A and G. Subsequently the polymerization 
product is chemically cleaved by treatment with acid, base or other cleavage 
scheme. This results in two products from the two possible alleles, one longer than 
the other as a result of the presence of the cleavable nucleotide at the polymorphic 
site in one allele but not the other. A mass change, but not a length change, also 

20 occurs on the opposite strand. One constraint is that one of the primers used for 
producing the polynucleotide must be located such that the first occurrence of the 
cleavable nucleotide after the end of the primer is at the polymorphic site. This 
usually requires one of the primers to be close to the polymorphic site. An 
alternative method is to simultaneously incorporate two cleavable nucleotides, one 

25 for a polymorphic nucleotide on the (+) strand, one for a polymorphic site on the (-) 
strand. For example, one might incorporate cleavable dA on the (+) strand (to 
detect an A-G polymorphism) and cleavable dC on the (-) strand (to positively detect 
the presence of the G allele on the (+) strand. In this case, it may be advantageous 
to have both primers close to the variant site. The two allelic products of different 

30 size can be separated by electrophoretic means, such as, without limitation, capillary 
electrophoresis. They could also be separated by mass using, without limitation, 
mass spectrometry. In addition, a FRET assay can be used to detect them, as 
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described below. Any of these three assay formats is compatible with multiplexing 
by means known in the art. 

One way to perform a FRET detection for the presence or absence of the 
allelic cleavage product is to introduce a probe with a fluor or a quencher moiety 
5 such that the probe hybridizes differentially to the cleaved strand (representing one 
allele) vs the non-cleaved strand (representing the other allele; see Fig. 2 for 
illustration of several possible schemes). Such differential hybridization is readily 
achievable because one strand is longer than the other by at least one, and often 
several nucleotides. If a fluor or quenching group is also placed on the primer used 

1 0 to produce the cleavable polynucleotide (by PCR or primer extension) such that an 
appropriate FRET interaction between the moiety on the probe and the moiety on 
the primer exists, i.e., the absorbing and emitting wavelengths of the two moieties 
are matched, and the distance and orientation between the two moieties is 
optimized by methods known to those skilled in the art, then a powerful signal will be 

15 present with one allele but not the other when the probe and primer are heated at 
the temperature that affords maximal hybridization discrimination. Ideally the probe 
is synthesized in a manner that takes maximal advantage of the different length of 
the cleaved and non-cleaved alleles. For example the primer should hybridize to the 
region that is removed by cleavage in one allele but is present in the other allele. 

20 When selecting primers for the PCR or primer extension one experimental design 
consideration would be to locate the primer so as to maximize the length difference 
between the two alleles. Other means of maximizing the discrimination would 
include the use of a "molecular beacon" strategy where the ends of the probe are 
complementary, and form a stem, except in the presence of the non-cleaved allele 

25 where the non-cleaved segment is complementary to the stem of the probe and 
therefore effectively competes with the formation of intramolecular stems in the 
probe molecule (Figs. 32 and 33). 

The above FRET methods can be performed in a single tube, for example, as 
follows: (1) PCR; (2) addition of cleavage reagent (and heat if necessary); (3) 

30 addition of the probe; and (4) temperature ramping if necessary in an instrument 
such as the ABI Prism which is capable of excitation and fluorescence detection in 
96 wells. 
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Another way to produce a FRET signal that discriminates the two variant 
alleles is to incorporate a nucleotide with a dye that interacts with the dye on the 
primer. The key to achieving differential FRET is that the dye modified nucleotide 
must first occur (after the 3' end of the primer) beyond the polymorphic site so that, 
5 after cleavage, the nucleotide dye of one allele (cleaved) will no longer be in within 
the requisite resonance producing distance of the primer dye while, in the other 
(uncleaved) allele, the proper distance will be maintained and FRET will occur. The 
only disadvantage of this method is that it requires a purification step to remove 
unincorporated dye molecules that can produce a background signal which might 

10 interfere with the FRET detection. A non-limiting example of the experimental steps 
involved in carrying out this method are: (1 ) PCR with dye-labeled primer and either 
a cleavable modified nucleotide with also carrier a dye or one cleavable modified 
nucleotide and one dye-labeled nucleotide. The dye can be on the cleavable 
nucleotide if the cleavage mechanism results in separation of the dye from the 

15 primer as, for instance, in the case of 5'-amino substitution which results in cleavage 
proximal to the sugar and base of the nucleotide; (2) cleavage at the cleavable 
modified nucleotide; (3) purification to remove free nucleotides; and (4) FRET 
detection. 

As noted earlier in this disclosure, we have demonstrated that polynucleotides 
20 containing 7-nitro-7-deaza-2'-deoxyadenosine in place of 2'-deoxyadenosine may be 
specifically and completely cleaved using piperidine/TCEP/Tris base. There are 
many other examples of chemistries where such PCR amplification and chemical 
cleavage may be possible. In a putative genotyping assay, a PCR reaction is 
carried out with one cleavable nucleotide analogue along with three other 
25 nucleotides. The PCR primers may be designed such that the polymorphic base is 
near one of the primers (P) and there is no cleavable base between the primer and 
the polymorphic base. If the cleavable base is one of the polymorphic bases, the P- 
containing cleavage product from this allele is expected to be shorter than the 
product from the other allele. The schematic presentation (Fig 27) and experimental 
30 data (Figs. 28 to 31 ) are examples of this arrangement. If the cleavable base is 

different from either of the polymorphic bases, the P-containing fragment would have 
the same length, but different molecular weight for the two alleles. In this case, 
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Mass Spectrometry would be the preferred analytical tool; although we had 
observed that oligonucleotides with one single base difference may migrate 
differently when analyzed by capillary electrophoresis. In one specific example, a 
82bp fragment of Transferrin Receptor gene was amplified by PCR using 7-nitro-7- 
5 deaza-2'-deoxyadenosine in place of 2-deoxyadenosine. The polymorphic base 
pair is A:T to G:C. The PCR amplification generated fully substituted product in 
similar yields to that of natural DNA (Figure 28). MALDI-TOF Mass Spectrometry 
analysis revealed the polymorphism in two regions of the spectra. The first between 
7000 Da and 9200 Da and the second between 3700 Da and 4600 Da (Figure 30, 

10 panel A). The first region demonstrated the difference in primer-containing 

fragments of different lengths (Figure 30. panel B). The second region showed the 
opposite strand of DNA containing the polymorphism that have the same length but 
different mass (Figure 30, panel C). The common fragments between the two 
alleles may serve as mass references. Capillary electrophoresis analysis may also 

15 be used (Fig. 31 ). Mobility difference between the two fragments of different length 
was easily detected in the test sample, as expected. In addition, mobility difference 
between two polymorphic fragments (1 1 nt) of same length but one different base (C 
vs. T) was observed, providing supporting evidence from the opposite strand. Fig. 
32 illustrates schemes for FRET detection of the same polymorphic site. 

20 b. Full substitution, full extension and complete cleavage at 

dinucleotides 

In another aspect of the present invention, two of the four nucleotides of 
which the subject polynucleotide is composed are completely replaced with modified 
nucleotides (either on one strand using primer extension, or on both strands using a 

25 DNA amplification procedure) and substantially complete cleavage is then effected 
preferentially at the site of dinucleotides involving the two different modified 
nucleotides. Generally, given the steric constraints of most cleavage mechanisms, 
the two modified nucleotides will be cleaved only when they occur in a specific order. 
For example if T and C are modified, the sequence 5' TpC 3' would be cleaved but 

30 5' CpT 3' would not (5' and 3' indicate the polarity of the polynucleotide strand and p 
indicates an internal phosphate group). 
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The rationale for dinucleotide cleavage is that mononucleotide cleavage is not 
ideally suited to the analysis of polynucleotides longer than 300 to 400 nucleotides 
because the number of fragments that must be detected and resolved by the mass 
spectrometer may become limiting and the likelihood of coincidental occurrence of 
5 two or more cleavage fragments with the same mass increases and begins to limit 
the efficiency of the method. This latter problem is especially acute with respect to 
the occurrence of mono-, di-, tri- and tetra nucleotides of the same composition which 
can mask the appearance or disappearance of fragments because MS is not 
quantitative.' In contrast, capillary electrophoresis, while not providing mass and 

10 thereby nucleotide content, is a quantitative method that allows detection of variation 
in the numbers of di-, tri- and tetranucleotides. 

Cleavage at modified dinucleotides should result in fragments averaging 
sixteen nucleotides in length. This is because the abundance of any dinucleotide, 
given four nucleotides, is 4 2 , which equals 16, assuming nucleotide frequencies are 

15 equal and there is no biological selection imposed on any class of dinucleotides (i.e. 
their occurrence is random). Neither of these assumptions is completely accurate, 
however, so there will in actuality be a wide size distribution of cleavage masses, 
with considerable deviation in the average size mass depending on which nucleotide 
pair is selected for substitution and cleavage. However, available information 

20 concerning the frequency of various dinucleotides in mammalian, invertebrate and 
prokaryotic genomes can be used to select appropriate dinucleotides. It is well 
known, for example, that 5' CpG 3' dinucleotides are underrepresented in 
mammalian genomes; they can be avoided if relatively frequent cleavage intervals 
are desired. 

25 i. Applications to variance detection 

If the sequence of the analyte polynucleotide is known, then an optimal 
dinucleotide cleavage scheme can be selected based on analysis of the masses of 
predicted cleavage fragments. For example, cleavage fragments that fall within the 
size range optimal for analysis by mass spectrometry can be selected by analysis of 
30 the fragment sizes produced by all possible dinucleotide cleavage schemes. 

Further, the theoretical efficiency of variance detection associated with all possible 
dinucleotide cleavage schemes can be determined as described above for full 
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mononucleotide substitution and cleavage - that is, by determining the detectability 
of every possible nucleotide substitution in the entire analyte fragment. In some 
cases two or more independent dinucleotide cleavage reactions may produce 
complementary results, or a second dinucleotide cleavage experiment may be run to 
5 provide corroboration. 

Given the length of dinucleotides (16mers on the average), it will often not be 
possible to determine with precision the location of a variant nucleotide based on 
one dinucleotide cleavage experiment. For example, if a 15 Dalton mass difference 
between samples is detected in a 14mer then there must be a C <-> T variance 

10 (Table 2) in the 14mer, with the heavier alleles containing T at a position where the 
lighter alleles contain C. However, unless there is only one C in the lighter variant 
fragment, or only one T in the heavier variant fragment, it is impossible to determine 
which, C or T, is the variant one. This ambiguity regarding the precise nucleotide 
which varies can be resolved in several ways. First, a second mono- or dinucleotide 

15 substitution and cleavage experiment, or a combination of such cleavage 

experiments, may be designed so as to divide the original variant fragment into 
pieces that will allow unambiguous assignment of the polymorphic residue. Second, 
an alternative sequencing procedure may be used as an independent check on the 
results, such as Sanger sequencing or sequencing by hybridization. 

20 ii. Applications to DNA sequencing 

As a stand alone procedure, dinucleotide substitution and cleavage can 
provide useful information concerning nucleotide content of DNA fragments 
averaging about 16 nucleotides in length, but ranging up to 30, 40 or even 50 or 
more nucleotides. However, as described above, the main applications of 

25 dinucleotide cleavage to DNA sequencing occur in conjunction with mononucleotide 
cleavage. The comparatively large DNA fragments produced by dinucleotide 
cleavage can be very useful in assorting the smaller fragments produced by 
mononucleotide cleavage into sets of fragments which must fit together. The 
additional constraints imposed by these groupings can be sufficient to allow 

30 complete sequence to be determined from even relatively large fragments. 

In Example 4 the steps required to infer a nucleotide sequence from a 20mer 
using four mononucleotide substitution and cleavage reactions are shown. The 
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procedures described in Example 4 could be carried out on a series of 10 - 30mers, 
the sequence content of which was initially defined, or at least constrained, by a 
dinucleotide cleavage procedure. Thereby, the sequence of a much larger fragment 
can be obtained. Note that as nucleotide length increases the relationship between 
5 fragment mass and sequence content becomes more ambiguous; that is, there are 
more and more possible sequences that could produce the given mass. However, if 
the number of nucleotides comprising the mass are known the number of possible 
nucleotide contents falls significantly (Pomerantz, S.C., et al., J. Am. Soc. Mass 
Spectrom .. 1993, 4: 204-209). Further, sequence constraints, such as the lack of 

10 internal dinucleotide sequences of a particular type, further reduce the number of 
possible nucleotide contents as illustrated in Table 4 for mononucleotide sets, 
c. Full substitution with modified nucleotide and partial cleavage 
Partial substitution with modified nucleotide and full cleavage 
Partial substitution with modified nucleotide and partial cleavage 

15 These applications provide partially cleaved polynucleotides by different 

strategies; each of these procedures has utility in specific embodiments of the 
invention. However, full substitution with a modified nucleotide and partial cleavage 
is the preferred method of producing partial cleavage products for mass 
spectrometric analysis. The reason is that with full substitution one can vary the 

20 degree of partial cleavage over a very wide spectrum, from cleavage of 1 in 100 
nucleotides to cleavage of 99 in 100 nucleotides. Partial substitution, even with full 
cleavage, does not allow this range of cleavage completeness. However, for 
modified nucleotides which are not efficiently incorporated by polymerases, lesser 
degrees of substitution are preferred. As the completeness of cleavage is reduced 

25 the relationship between cleavage fragments over a longer and longer range 

becomes evident. On the other hand as the completeness of cleavage is increased 
the ability to obtain precise mass data and unambiguous assignment of nucleotide 
content is increased. The combination of slight, intermediate and substantial 
cleavage provides an integrated picture of an entire polynucleotide, whether the 

30 application is variance detection or sequencing. The small polynucleotides of 
defined nucleotide content can be joined into larger and larger groups of defined 
order. 
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Partial substitution with full cleavage and partial substitution with partial 
cleavage are useful for the preparation of sequencing ladders. If a modified 
nucleotide is not efficiently incorporated into polynucleotides by available 
polymerases then a low ratio of partial substitution may be optimal for efficient 
5 production of polynucleotides containing the modified nucleotide. However a low 
degree of substitution may then require complete cleavage in order to produce 
sufficient cleavage fragments for ready detection. 

Partial substitution with partial cleavage is generally a preferred approach as 
conditions for complete cleavage may be harsh and thereby result in some 

10 nonspecific cleavage or modification to polynucleotides. Also, partial substitution at 
relatively high levels (i.e. at 5% or more of the occurrences of the nucleotide) allows 
a range of partial cleavage efficiencies to be analyzed. As with MS analysis, there 
are advantages to being able to test multiple degrees of cleavage. For example, it is 
well known in Sanger sequencing that there are tradeoffs to production of very long 

15 sequence ladders: generally the beginning of the ladder, with the shortest 

fragments, is difficult to read as is the end of the ladder with the longest fragments. 
Similarly, the ability to manipulate partial cleavage conditions with the 
polynucleotides of this invention will allow a series of sequencing ladders to be 
produced from the same polynucleotide that provide clear sequence data close to 

20 the primer or at some distance from the primer. As shown in Fig. 17, sequence 
ladders produced by chemical cleavage have a much better distribution of labeled 
fragments than dideoxy termination over distances up to 4 kb and beyond. 

Partial cleavage may also be obtained by the substitution of cleavage- 
resistant modified nucleotides, described above, for all but one natural nucleotide, 

25 which then provides the cleavage sites. In addition, as described previously, 
combinations of cleavage resistant modified nucleotides and cleavage-sensitive 
modified nucleotides may be used. 

While any technique which permits the determination of the mass of relatively 
large molecules without causing non-specific disintegration of the molecules in the 

30 process may be used with the methods of this invention, a preferred technique is 
MALDI mass spectroscopy since it is well suited to the analysis of complex mixtures 
of analyte. Commercial MALDI instruments are available which are capable of 
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measuring mass with an accuracy on the order of 0.1% to .05%. That is, these 
instruments are capable of resolving molecules differing in molecular weight by as 
little as one part in two thousand under optimal conditions. Advances in MALDI MS 
technology will likely increase the resolution of commercial instruments in the next 
5 few years. Considering the smallest difference that can occur between two strands 
containing a variance (an A-T transversion, a molecular weight difference of 9; see 
Table 5), and given a MALDI apparatus with a resolution of 2,000 (that is, a machine 
capable of distinguishing an ion with an m/z (mass/charge) of 2,000 from an ion with 
an m/z of 2,001 ), the largest DNA fragment which the A-T transversion would be 

10 detectable is approximately 18,000 Daltons (a 'Dalton' is a unit of molecular weight 
used when describing the size of large molecules; for all intents and purposes it is 
equivalent to the molecular weight of the molecule). In the experimental setting, the 
practical resolving power of an instrument may be limited by the isotopic 
heterogeneity of carbon; i.e., carbon exists in nature as Carbon-12 and Carbon-13, 

15 as well as other factors. Assuming an approximately even distribution of the four 
nucleotides in the DNA fragment, this translates to detection of an A-T transversion 
in an oligonucleotide containing about 55 nucleotides. At the other end of the 
spectrum, a single C-G transversion, which results in a moleuclar weight difference 
of 40, could be detected using MALDI mass spectroscopy in an oligonucleotide 

20 consisting of about 246 nucleotides. The size of an oligonulceotide in which an A-T 
transversion would be detectable could be increased by substituting a heavier non- 
natural nucleotide for either the A or the T; for example, without limitation, replacing 
A with 7-methyl-A, thus increasing the molecular weight change to 23. Table 5 
shows the approximate size of an oligonucleotide in which each possible single point 

25 mutation could be detected for mass spectrometers of different resolving powers 
without any modification of molecular weight. 

A variety of chemical modifications of nucleotides have been described with 
respect to their utility in increasing the detectability of mass differences during MS 
analysis. A particularly useful mass modification for use with the methods of this 

30 invention is the purine analog 2-chloroadenine, which has a mass of 364.5. As 
shown in Table 2, Panel B, this has a favorable effect on mass differences between 
all the nucleotides and A. Most important, it changes the T-A difference from 9 Da to 
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42.3 Da. Further, it has been shown that 2-chloradenine can be incorporated in 
polynucleotides by DNA polymerase from Thermus aquaticus. Full substitution on 
one strand has been described. (Hentosh, P. Anal. Biochem. . 1992, 201 : 277-281 .) 
E. Examples 
5 1 . Polymerase Development 

A variety of mutant polymerases have bee shown to have altered catalytic 
properties with respect to modified nucleotides. Mutant polymerases with reduced 
discrimination between ribonucleotides and deoxyribonucleotides have been 
extensively studied. Human DNA polymerase p mutants that discriminate against 
10 azidothymidine (AZT) incorporation have been isolated by genetic selection. Thus, it 
is highly likely that mutant polymerases capable of incorporating any of the modified 
nucleotides of this invention better than natural polymerases can be produced and 
selected. 

The following procedure can be employed to obtain an optimal polymerase 

15 for incorporation of a particular modified nulceotide or nucleotides into a 

polynucleotide. It is understood that modifications of the following procedure will be 
readily apparent to those skilled in the art; such modifications are within the scope 
of this invention. 

a. A starting polymerase is selected. Alternatively, multiple 

20 polymerases that have different sequences and/or different capabilities with regard 
to incorporation of a modified nucleotide or nucleotides into a polynucleotide might 
be selected. For example, without limitation, two polymerases, one of which 
efficiently incorporates a nucleotide having a sugar modification and the other of 
which efficiently incorporates a nucleotide having a phosphate backbone 

25 modification, might be selected. The coding sequences of the polymerase(s) is then 
cloned into a prokaryotic host. 

It may be advantageous to incorporate a protein tag in the polymerase 
during cloning, the protein tag being selected for its ability to direct the polymerase 
into the periplasmic space of the host. An example, without limitation of such a tag 

30 is thioredoxin. Proteins in the periplasmic space can be obtained in a semi-pure 
state by heat shock (or other procedures known in the art) and are less likely to be 
incorporated into inclusion bodies. 
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b. Several (preferrably three or more) rounds of shuffling 
(Stemmer, supra ) are then performed. 

c. After each round of shuffling, the shuffled DNA is transformed 
into a host. The library of transformants obtained is then plated and pools of 

5 transformants (approximately 10-1 000 colonies per pool) are prepared from the 
host cell colonies for screening by sib selection. A lysate is then made from each 
pool. The host may be prokaryotic such as, without limitation, bacteria or a single- 
celled eukaryote such as a yeast. The following description assumes the use of a 
bacterial prokaryotic host but other possible prokaryotic hosts will be apparent to 
10 those skilled in the art and are within the scope of this invention. 

d. The lysates are subjected to dialysis using a low molecular 
weight cut-off membrane to remove substantially all natural nucleotides. This is 
necessary because the assay for polymerase with the desired characteristics entails 
polymerase extension of a primer in the presence of modified nucleotides. The 

15 presence of the corresponding natural nucleotides will result in a high background in 
the assay which might obscure the results. An alternative procedure is degradation 
of all natural nucleotides with a phosphatase such as shrimp alkaline phosphatase. 

e. Add the following to the dialyzed lysate: a single stranded DNA 
template, a single stranded DNA primer complementary to one end of the template, 

20 the modified nucleotide or nucleotides whose incorporation into the DNA is desired 
and the natural nucleotides which are not being replaced by the modified 
nucleotides. If the desired polymerase is to have the capability of incorporating two 
contiguous modified nucleotides, then the template should be selected to contain 
one or more complementary contiguous sequences. For example, without limitation, 

25 if a polymerase which is capable of incorporating a modified-C-modified-T sequence 
is desired 5' to 3', the template should contain one or more G-A or A-G sequences 3' 
to 5'. Following (that is, 5' to) the segment of the template strand designed to test 
the ability of the polymerase to incorporate the modified nucleotide or nucleotides is 
segment of template strand that produces a detectable sequence when copied by 

30 the polymerase. The sequence can be detected in several ways. One possibility is 
to use a template having a homopolymeric segment of nucleotides complementary 
to one of the natural nucleotides. Then, if the goal is, for example, identification of a 
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polymerase that incorporates modified C, then detection might entail polymerization 
of a consecutive series of A, G or T providing, however, that the nucleotide used for 
detection does not occur earlier in the polymerized sequence complementary to the 
template sequence. The detection nucleotide could be a radiolabeled or dye- 
5 labeled nucleotide that would only be incorporated by mutant polymerase that had 
already traversed the segment of template requiring incorporation of the modified 
nucleotide(s). Another way to detect the homopolymer would be to make a 
complementary radiolabeled or dye-labeled probe that could be hybridized to the 
homopolymer produced only in those pools containing a polymerase capable of 

10 incorporating the modified nucleotide(s). Hybridization could then be detected by, 
for example, spotting the primer extension products from each pool on a nylon filter, 
followed by denaturing, drying and addition of the labeled homopolymeric probe 
which would hybridize to the complementary strand of the polymerization product. 
Of course, a homopolymer or other sequence not present in the host cell genome or 

15 an episomes should be used to minimize background hybridization to host 
sequences present in all the pools. 

Yet another detection procedure would be to incorporate a sequence 
corresponding to an RNA polymerase promoter, such as, without limitation, the T7 
promoter, followed by a reporter sequence into the template. These sequences 

20 should be located downstream (3' to) the primer and template sequence requiring 
incorporation of modified nucleotides. The T7 promoter will be inactive until it 
becomes double-stranded as a consequence of the polymerization; however, 
polymerization of the T7 promoter sequence will only occur if the mutant polymerase 
being tested is capable of incorporating the modified nucleotide or sequence of 

25 modified nucleotides which lie upstream of the T7 promoter sequence. The reporter 
sequence may include a homopolymeric sequence of a nucleotide (e.g., T) the 
complement of which (in this case, A) is labeled with a dye or radioactive label. In 
this manner, high levels of T7 polymerase mediated transcription will result in large 
quantities of high molecular weight (i.e., capable of precipitation by trichloroacetic 

30 acid), labeled polymer. An alternative reporter sequence might be a ribozyme 
capable of cleaving an exogenously added marker oligonucleotide which permits 
easy distinction of cleaved from non-cleaved products. For example, again without 
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limitation, one end of the oligonucleotide might be biotinylated and the other end 
might contain a fluorescent dye. Such systems are capable of 1000-fold or greater 
amplification of a signal. In this approach it would first be necessary to demonstrate 
that the function of the promoter is not disturbed by the presence of modified 
5 nucleotide or to create a version of the promoter that lacks the nucleotide being 
modified. 

f. Any pool of lysed bacterial colonies which contains a polymerase 
capable of incorporating the selected modified nucleotide or contiguous modified 
nucleotides will produce detectable homopolymer or will contain double-stranded T7 

10 RNA polymerase promoter upstream of a marker sequence as the result of the 

polymerization across the modified nucleotide or contiguous nucleotides, across the 
T7 promoter and across the marker sequence. Addition of T7 RNA polymerase to the 
mixture (or, alternatively, expression of T7 RNA polymerase from a plasmid) will result 
in transcription of the marker sequence which then can be detected by an appropriate 

15 method depending on the marker system selected. It may not be necessary to select 
or design a promoter which either lacks the modified nucleotide(s) or which can 
function effectively with the modified nucleotide(s). 

g. Bacterial colonies containing a polymerase having the desired 
properties are then identified and purified from pools of bacterial colonies by sib 

20 selection. In each round of selection the pool or pools with the desired properties are 
split into sub-pools and each sub-pool is tested for activity as set forth above. The sub- 
pool displaying the highest level of activity is selected and separated into a second 
round of subpools and the process repeated. This is repeated until there is only one 
colony remaining which contains the desired polymerase. That polymerase can them 

25 be recloned into a protein expression vector and large amounts of the polymerase can 
be expressed and purified. 

Another approach to polymerase development involves the well-known 
propensity for some antibiotics to kill only growing cells, e.g., penicillin and related drugs 
which kill by interfering with bacterial cell wall synthesis of growing cells but do not 

30 affect quiescent cells. 

The approach would be to introduce a modified nucleotide into bacterial cells 
which have been genetically altered to express one or more mutant polymerases, 
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preferably a library of mutant polymerases. An ideal host strain would be one in which 
the endogenos polymerase has been inactivated but is complemented by a plasmid- 
encoded polymerase. A library of polymerases could than be created on a second 
plasmid with a different selectable marker, e.g., antibiotic resistance. The library would 
5 then be introduced into the host cell in the presence of negative selection against the 
first (non-mutated) polymerase-encoding plasmid, leaving cells with only the mutant 
polymerases. If one or more of the mutant polymerases is capable of incorporating the 
modified nucleotide into the genetic material of the cells, the expression of the modified 
gene(s) will be altered and/or a series of host cell responses will be elicited such as the 

10 SOS response which affects cell growth. The effect sought would be reversible growth 
arrest, ie, a cytostatic rather than cytocidal effect. The cells would then be treated with 
an antibiotic which only kills actively growing cells. The cells are then removed from the 
presence of the antibiotic and placed in fresh growth medium. Any cells whose growth 
was arrested by the incorporation of the modified nucleotide into their genetic material 

1 5 and therefore which were unaffected by the antibiotic would form colonies. Plasmids 
containing the code for the polymerase which catalyzed the incorporation of the 
modified nucleotide into the cells' genetic material are then isolated and the procedure 
repeated for additional rounds of selection. Once a sufficient number of selection 
rounds have been performed, the polymerase is isolated and characterized. An 

20 exemplary, but by no means limiting, experimental procedure which might be employed 
to accomplish the foregoing is as follows: 

1 . Select a polymerase or set of polymerases for mutagenesis. The starting 
polymerase(s) may include, without limitation, a mutant polymerase such as Klenow 
E710A, wild type polymerases, thermostable orthermolabile polymerases or 

25 polymerases known to complement E. coli DNA Pol I, etc. 

2. Prepare a library of mutant polymerases using tehniques such as "dirty 
PCR," shuffling, site-directed mutatgenesis or other diversity generating procedures. 

3. Clone the library into a plasmid vector. 

4. Transform bacteria with the plasmid library and isolate transfectants by 
30 selection on an appropriate antibiotic. Preferably, the host strain has an inactivated 

chromosomal polymerase and selection can be applied to insure that only the mutant 
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polymerases are expressed in the host cells, as described above. The only cells 
harboring plasmids encoding functional polymerases will survive this step. 

5. Add the modified nucleotide triphosphate to the media. It may be 
necessary to use a cell permeabilizing procedure such as electroporation, addition of 

5 calcium or rubidium chloride, heat shock, etc. to facilitate entrance of the modified 
nucleotide into the cells. The cells are then grown in the presence of the modified 
nucleotide triphosphate until incorporation of the modified nucleotide(s) induces arrest 
of cell growth in selected cells. 

6. Add penicillin, ampicillin, nalidixic acid or any other antibiotic that 

10 selectively kills actively dividing cells. Continue growing the cells for a selected time. 

7. Spin the cells out, suspend them in fresh LB media and plate them. Grow 
for an empirically determined time. 

8. Select colonies, isolate the plasmids and repeat steps 4 to 7 for additional 
rounds of selection or, in the alternative, use a biochemical assay for incorporation of 

15 the modified nucleotide to examine individual colonies or pools of colonies. Such an 
assay might entail polymerization of a template in the presence of radiolabeled 
modified nucleotide on individual clones or on pools of clones in a sib selection scheme. 

9. Further characterize the polymerase(s) determined to have the desired 
activity by the assay of step 8. 

20 10. Remutagenize the polymerase(s) obtained in Step 8 and repeat the 

selection procedure from Step 3. 

1 1 . When an acceptable level of ability to incorporate the modified nucleotide 
is achieved, isolate and characterize the polymerase. 

Another method for selecting active polymerases for incorporation of modified 

25 nucleotide involves use of a bacteriophage which has been described for selection of an 
active enzyme (Pedersen et. al., Proc. Natl. Acad. Sci. USA, 1998, 95:10523-8). A 
modification of that procedure might be used for mutant polymerase selection. That is , 
oligonucleotides which are covalently attached to phage surfaces can be extended by 
mutant polymerases expressed on the surface of the phage. Dye-labeled modified 

30 nucleotides would be used for primer extension. After removal of unincorporated 
nucleotides, the phage bearing dye modified nucleotide could be identified using 
fluorescence activated cell sorting procedures. Alternatively, using an appropriate 
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template design, the fluorescence label can be attached to another nucleotide which 
would only be incorporated downstream of a stretch of modified nucleosides. 

Yet another approach to identifying active polymerases for modified nucleotide 
incorporation would use available X-ray crystal structures of polymerases bound to 
5 template DNA and nucleotide substrate. Based on observed or predicted interactions 
within the polymerase/substrate complex, rational amino acid changes could be created 
to accommodate the structural deviation of a given modified nucleotides. For example, 
based on the structural information on a complex of T7 polymerase and its substrates 
for which the X-ray crystal structure shows the amino acids that are in the polymerase 

10 active site (Doublie et. al., A/art/re, 1998, 391:251-258), site-directed mutagenesis might 
be designed for structurally similar protein Klenow to increase its specific activity for 
incorporation of ribonucleotides (rNTPs) and/or 5'-amino-nucleotides (5'-aminodNTPs). 

The E710A mutant of Klenow (Astatke et. al., Proc. Nat. Acad. Sci. USA, 1998, 
95:3402-3407) has an increased capacity to incorporate rNTPs as compared to wild 

15 type Klenow, probably because the mutation removes the steric gate against the 2'- 
hydroxyl group of rNTPs. This mutation, however, decreased the mutant's activity for 
incorporation of natural dNTPs and 5'-aminodNTPs. In this case, use of the E710S 
mutation might lead to improved activity because E710S might possibly H-bond with the 
2'-OH of rNTPs substrates. The E710A or E710S mutation might also be used in 

20 combination with Y766F, a previously described mutant which by itself has little effect 
on polymerase activity (Astatke et al., J. Biol. Chem., 1995, 270: 1945-54). The crystal 
structure of Y766 reveals that its hydroxyl forms hydrogen bonds with the side chain of 
E710, which might affect polymerase activity when E710 is truncated to Ala. On the 
other hand, E710 mutations in combination with F762A might improve activity by 

25 holding the sugar ring in a defined position. Similarly, better incorporation of the 5'- 
amino-analogs might be achieved by relaxing the binding of the polymerase on the 
nucleotide substrate since the 5'-nitrogen changes the conformation of the nucleotide 
and thus the alignment of the alpha-phosphorous atom. Initially, the focus could be on 
mutagenesis on a limited number of residues that engage the sugar and phosphates of 

30 the nucleotide substrate such as residues R668, H734, and F762. The H881 residue 
might also work. Although It is further from the dNTP binding site, an Ala substitution at 
this position influences the fidelity of dNTP incorporation (Polesky et al., J. Biol. Chem., 
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1990, 265:14579-91). These residues could be targeted for cassette mutagenesis to 
ascertain the amino acid residue with maximized effect, followed by selection for active 
polymerases as described. R668K substitution is particularly interesting, because it 
should eliminate contact to the dNTP while preserving the minor groove interaction with 
5 the primer 3'-NMP. On the other hand, Although R754 and K758 contact the beta and 
alpha phosphates, changes at these positions are likely to severely impair catalysis. 
Histidine or lysine at these positions could preserve interactions with the phosphates 
and might retain activity. 

Another method for selecting active polymerases for incorporation of modified 

10 nucleotides involves use of the phage display system, which allows foreign proteins 
to be expressed on the surface of bacteriophage as fusions with phage surface 
proteins. Kay, B. K., Winter, J. and J. McCafferty (Editors) Phage Display of 
Peptides and Proteins : A Laboratory Manual. Academic Press, 1996. Establishing 
an experimental system for detection of a mutant polymerase would entail 

15 expressing mutant polymerases on the surface of a library of phage, and 

subsequently isolating phage bearing polymerases with the desired polymerase 
activity, which Aspects of such a system have has been described for selection of 
an active enzyme nuclease (Pedersen et. al., Proc. Natl. Acad. Sci. USA, 1998, 
95:10523-8). A modification of that procedure might be used for mutant polymerase 

20 selection. That is, oligonucleotides which are covalently attached to proteins on the 
phage surfaces surface can be extended by mutant polymerases expressed on the 
surface of the same phage. The oligonucleotides must fold up to provide a primer- 
template complex recognizable by the polymerase, or alternatively a primer 
complementary to the oligonucleotide can be provided separately. In either event, 

25 the portion of the oligonucleotide serving as a template for polymerization will 
contain nucleotides complementary to the modified nucleotide(s) for which an 
efficient polymerase is being sought. The template oligonucleotide may also be 
designed so that the extension product is easily detectable as a result of templated 
incorporation of a labeled nucleotide which occurs only after polymerization across 

30 the segment of template requiring incorporation of the modified nucleotide(s). One 
method for selectively enriching phage bearing polymerases with the desired 
catalytic properties involves use of a fluorescence activated cell sorter (FACS). Dye- 
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labeled modified nucleotides would be used for incorporated in a primer extension 
reaction only after incorporation of the test modified nucleotide(s). After removal of 
unincorporated nucleotides, the phage bearing with attached dye modified 
nucleotides (which must encode mutant polymerases capable of incorporating the 
5 modified nucleotide or nucleotides) could can be identified enriched in one or more 
rounds using fluorescence activated cell sorting procedures (Daugherty P.S., et al., 
Antibody affinity maturation using bacterial surface display. Protein Eng 1 1 :825-32, 
1998). Alternatively, the modified nucleotide(s) themselves can be labelled with dye 
and detection will similarly be accomplished by FACS sorting of dye labeled phage. 

1 0 This procedure has the disadvantage that the dye may interfere with polymerization; 
however one skilled in the art will recognize that the dye can be attached to the 
modified nucleotide via a linkage that is unlikely to inhibit polymerization, using an 
appropriate template design, the fluorescence label can be attached to another 
nucleotide which would only be incorporated downstream of a stretch of modified 

15 nucleosides. 

Yet another approach to identifying active polymerases for modified nucleotide 
incorporation would be to use available X-ray crystal structures of polymerases bound 
to template DNA and nucleotide substrate. Based on observed or predicted interactions 
within the polymerase/substrate complex, rational amino acid changes could be created 

20 to accommodate the structural deviation of a given modified nucleotides. For example, 
based on the structural information on a complex of T7 polymerase and its substrates 
for which the X-ray crystal structure shows the amino acids that are in the polymerase 
active site (Doublie et. al., Nature, 1998, 391:251-258), site-directed mutagenesis might 
be designed for structurally similar protein Klenow to increase its specific activity for 

25 incorporation of ribonucleotides (rNTPs) and/or 5'-amino-nucleotides (5'-aminodNTPs). 

The E710A mutant of Klenow (Astatke et. al., Proc. Nat. Acad. Sci. USA, 1998, 
95:3402-3407) has an increased capacity to incorporate rNTPs as compared to wild 
type Klenow, probably because the mutation removes the steric gate against 2'-hydroxyl 
group of rNTPs. This mutation, however, decreased the mutant's activity for 

30 incorporation of natural dNTPs and 5 -aminodNTPs. In this case, use of the E710S 

mutation might lead to improved activity because E710S might possibly H-bond with the 
2'-OH of rNTPs substrates. The E710A or E710S mutation might also be used in 
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combination with Y766F, a previously described mutant which by itself has little effect 
on polymerase activity (Astatke et al., J. Biol. Chem., 1995, 270: 1945-54). The crystal 
structure of Y766 reveals that its hydroxyl forms hydrogen bonds with the side chain of 
E710, which might affect polymerase activity when E710 is truncated to Ala. On the 
5 other hand, E710 mutations in combination with F762A might improve activity by 
holding the sugar ring in a defined position. Similarly, better incorporation of the 5'- 
amino-analogs might be achieved by relaxing the binding of the polymerase on the 
nucleotide substrate since the 5'-nitrogen changes the conformation of the nucleotide 
and thus the alignment of the alpha-phosphorous atom. Initially, the focus could be on 

10 mutagenesis on a limited number of residues that engage the sugar and phosphates of 
the nucleotide substrate such as residues R668, H734, and F762. The H881 residue 
might also work. Although It is further from the dNTP binding site, an Ala substitution at 
this position influences the fidelity of dNTP incorporation (Polesky et al., J. Biol. Chem., 
1990, 265:14579-91). These residues could be targeted for cassette mutagenesis to 

15 ascertain the amino acid residue with maximized effect, followed by selection for active 
polymerases as described. R668K substitution is particularly interesting, because it 
should eliminate contact to the dNTP while preserving the minor groove interaction with 
the primer 3'-NMP. On the other hand, Although R754 and K758 contact the beta and 
alpha phosphates, changes at these positions are likely to severely impair catalysis. 

20 Histidine or lysine at these positions could preserve interactions with the phosphates 
and might retain activity. 

One skilled in the art will recognize that the collection of preferred amino acid 
modifications to Klenow polymerase described above may be applied to other 
polymerases to produce useful mutant versions of those polymerases. This can be 

25 accomplished by aligning the amino acid sequences of the other polymerases with 
that of Klenow polymerase to determine the location of the corresponding amino 
acids in the other polymerases, and/or, where crystal structures are available, 
comparing three dimensional structures of other polymerases with that of Klenow 
polymerase to identify orthologous amino acids. Methods for performing site 

30 directed mutagenesis and expressing mutant polymerases in procaryotic vectors are 
known in the art ( Ausubel, F. M., et al., Current Protocols in Molecular Biology . John 
Wiley & Sons, 1998). 
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In addition to producing and screening for mutant polymerases capable of 
incorporating modified nucleotides it may also be useful in some instances to screen for 
other polymerase properties. In general the additional desirable polymerase properties 
described below are more difficult to assay than incorporation of modified nucleotides, 
5 so assays for these additional properties may be conducted as a second screen of 
mutant polymerases with demonstrated capacity to incorporate modified nucleotides. 
One aspect of this invention is that cleavage at modified nucleotides may be caused or 
enhanced by contact between the modified nucleotides and a polymerase (see 
Example and Figures 20-26). This is a preferred cleavage mode as it obviates a 

10 separate cleavage step. Thus it is useful to assay mutant polymerases for cleavage- 
enhancing properties. One simple assay for such properties is a primer extension 
where the extension sequence following the primer includes the cleavable nucleotide(s) 
followed by the first occurrence of a different nucleotide which is detectably labeled. In 
the event of polymerase assisted cleavage the labeled molecule will be separated from 

15 the primer resulting in a smaller labeled molecule, which can be detected by 

electrophoretic or other methods. A second useful property of mutant polymerases is 
the ability to recognize a modified nucleotide or nucleotides in a template strand and 
catalyze incorporation of the appropriate complementary nucleotide (natural or 
modified) on the nascent complementary strand. This property is a necessary condition 

20 for a polymerase to be used in a cycling procedure such as PCR, where newly 

synthesized polynucleotides serve as templates in successive rounds of amplification. 
A simple assay for such properties is a short primer extension where the template 
strand is synthesized with the modified nucleotide or nucleotides occurring shortly after 
the end of the primer, such that a primer extension reaction will soon encounter the 

25 modified nucleotide(s). Successful polymerization across the template, indicating use of 
the modified nucleotide(s) as templates, will result in a longer extension product than 
failure to utilize the modified nucleotides as templates. The extension product can be 
made easily detectable by synthesizing the template so as to cause templated 
incorporation of a labeled nucleotide only after traversing the modified nucleotide(s). 

30 The sequence of the extension product can subsequently be determined to confirm that 
the nucleotides incorporated on the extension strand opposite the modified nucleotides 
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are correct. Still other attractive properties of polymerases include high fidelity, 
thermostability and processivity. Assays for these properties are known in the art. 
Example 2. Variance Detection by Mononucleotide Restriction 
The following procedure is an example of nucleotide sequence variance 
5 detection in a polynulceotide without the necessity of obtaining the complete sequence 
of the polynucleotide. While the modified nucleotide used in this example is 7- 
methylguanine (7-methylG) and the polynucleotide under analysis is a 66 base-pair 
fragment of a specific DNA, it is understood that the described technique may be 
employed using any of the modified nucleotides discussed above or any other modified 
10 nucleotides which, as noted above, are within the scope of this invention. The 
polynucleotide may be any polynucleotide of any length that can be produced by a 
polymerase. 

A 66 base pair region of the 38 Kda subunit of replication factor C (RFC) 
cDNA was amplified by PCR (polymerase chain reaction). Three primers were used 

15 in two separate amplification reactions. The forward primer (RFC bio) was 
biotinylated. This allows the isolation of a single-stranded template using 
streptavidin-coated beads which can then be extended using the Klenow exo- 
fragment of E. Coli DNA polymerase to incorporate the 7-methylG. This also permits 
cleanup of the modified 7-methylG DNA after extension and prior to cleavage. 

20 Two reverse primers were used in a separate amplification reaction; one 

matched the natural sequence for the RFC gene (RFC), the other (RFC mut) 
introduced a base mutation (T to C) into the 66 base pair RFC sequence. The 
primers and corresponding products are also labelled RFC 4.4 and RFC 4.4 Mut in 
some of the Figures herein. 

25 Using PCR and the above two primers, 66 base pair fragments were 

produced (Fig. 1). The two fragments differ at one position, a T to C change in the 
biotinylated strand and an A to G change in the complementary strand (encoded by 
the two reverse primers). The PCR products were purified using streptavidin 
agarose and the non-biotinylated strand from each PCR product was eluted and 

30 used as a template for primer extension. The biotinylated primer RFC bio was 
extended on these templates in the presence of dATP, dCPT, dTTP and 7-methyl 
dGTP. 
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The streptavidin agarose-bound single-stranded DNA was then incubated 
with piperidine for 30 minutes at 90° C to cleave at sites of incorporation of 7- 
methylG into the modified DNA fragment. This treatment also resulted in the 
separation of the biotinyated fragment from streptavidin. The reaction mixture was 
5 subjected to centrifugation and the polynucleotide-containing supernatant was 
transferred to a new tube. The DNA was dried in a speed vac and re-suspended in 
deionized water. This sample was then subjected to MALDI mass spectrometry. 

Figure 2 shows the molecular weights of the expected fragments of interest 
as a result of the cleavage of the biotinylated DNA strand at each site of 

10 incorporation of 7-methylG. These fragments and their molecular weights are: a 27- 
mer (8772.15), a 10-mer (3069.92), an 8-mer (2557.6), and one of the following 10- 
mers depending on the reverse primer used in the PCR reaction, RFC (3054.9) or 
RFC mut (3039.88). The biotinylated 20-mer primer is also present because it was 
provided in excess in the extension reaction. The 10-mer fragments for RFC and 

1 5 RFC mut, which differ by 1 5 daltons, are the ones which should be detected and 
resolved by mass spectrometry, thus revealing the point mutation. 

Figure 3 shows a denaturing polynucleotide sequencing gel analysis of the 
RFC and RFC mut Klenow polymerase extension fragments before and after 
cleavage with piperidine. All the expected fragments were present in both cases. 

20 Most of the additional minor bands are the result of incomplete cleavage of the DNA 
strand by piperidine. Complete cleavage may be achieved through two cycles of 
piperidine treatment using freshly distilled piperidine for 30 minutes at 90° C with 
each cycle being followed by drying and washing of the samples (data not shown). 
The band from the RFC mut cleavage (lane 4 of Fig. 3) which runs between the 8- 

25 mer and the 10-mer is the only band not explained by complete or incomplete 
cleavage. 

Figure 4 is the RFC mass spectrogram of the RFC sample. The peak on the 
far right is the biotinylated primer band which was used as a standard to calculate 
the molecular weights of all other bands. The left side of the spectrogram reveals all 
30 three expected cleavage bands (two 10-mers and an 8-mer). The insert in Figure 4 
is a magnified view of the region surrounding the two 10-mers and the 8-mer. The 
molecular weights in this region were all uniformly off by about 20 daltons because 
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the primer used for calibration was off by 20 daltons. However, the mass differences 
between the peaks were all exactly as predicted. 

Figure 5 shows the mass spectrogram and a magnified portion thereof from 
the RFC mut sample. Two peaks should remain the same between the RFC and 
5 RFC mut samples, one of the 10-mers (3089.67) and the 8-mer (2576.93). The 
molecular weight of the remaining 10-mer should be decreased in the RFC-mut 10- 
mer by 15.02 Da (from 3054.9 to 3039.88) due to the single T to C switch and the 
mass difference between it and the unchanged RFC 10-mer should be 30.04 
(3039.88 vs. 3069.92). However, the mass difference actually obtained from the 

10 RFC mut was 319.73 Da. This might be due to a deletion of a C from the 10-mer 
corresponding to nucleotides 57 - 66. This would also explain the anomolous 9-mer 
on the RFC mut sequencing gel (Figure 3). For this to be so, the commercially 
obtained primer used in the amplification reaction would have to have been missing 
a G. The expected molecular weights for the RFC primer, the RFC mut primer and 

15 the RFC mut primer with a single G deletion are shown in Table 6. To test the 
hypothesis that an error had occurred in the synthesis of RFC mut oligonucleotide 
primer, the RFC and RFC mut oligonucleotides were then combined and subjected 
to mass spectrometry. As can be seen from the mass differences obtained (Fig. 6 
and Table 6), the hypothesis was correct, the RFC mut primer was indeed missing 

20 one G. 

The power of the method of this invention is dramatically revealed in the 
above experiment. What began as a controlled test of the method using a known 
sequence and a known nucleotide variance actually detected an unknown variance 
in an unexpected place - the RFC mut primer. 

25 Example 3. Variance Detection by Dinucleotide Restriction 

A restriction enzymes that has a four base pair recognition site will cleave 
DNA specifically with a statistical frenquency of one cleavage every 256 (4 4 ) bases, 
resulting in fragments that are often too large to be analyzed by mass spectrometry 
(Figure 19A). Our chemical dinucleotide restriction strategy, on the other hand, 

30 would result in much smaller fragments of the same polynucleotide. The average 
size of the fragments obtained is 16 (2 4 ) bases (Figure 19B) which is quite amenable 
to mass spectrometry analysis. 
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An example of this chemical restriction principle is illustrated in Figure 20. 
Depicted in this figure is a dinucleotide pair having a ribonucleotide and 5'- 
aminonucleotides connected in 5' to 3' orientation, thereby positioning the 2- 
hydroxyl group of the ribonucleotide in close proximity to the phosphoramidate 
5 linkage. The chemical lability of the phosphoramidate linker is enhanced since the 
hydroxyl group can attack the phosphorous atom to form a 2', 3'-cyclic phosphate, 
resulting in the cleavage of DNA at this particular dinucleotide site. 

Shown in Figure 21 is an actual application of this approach. A 5'- 32 P labeled 
20nt primer was extended with a mixture of Klenow (exo-) and E710A Klenow (exo-) 

10 polymerases using a 87nt single stranded template in a Tris buffer at pH9. The 
primer extension was performed with riboGTP (lane 1), 5'-aminoTTP (lane 3), or 
riboGTP/5'-aminoTTP (lane 5) in place of the corresponding natural nucleotides. 
After the extension, the reaction mixtures were purified on a G25 column. The 
riboG-containing extension product was cleaved with aqueous base to generate a G 

15 sequencing ladder (lane 2). The 5'-aminoT-containing product was, on the other 
hand, acid labile and was cleaved to afford a T sequencing ladder (lane 4). Under 
the conditions of the extension reaction with riboGTP/5'-aminoTTP (lane 5), a 64nt 
product was obtained instead of the expected 87nt. Interestingly, the 64nt fragment 
is one of the dinucleotide cleavage products expected for GT restriction and the only 

20 one which should be visible by autoradiography. Acid cleavage of this product 

produced a T ladder (lane 6) whereas base cleavage generated a G ladder (lane 7), 
indicating the successful incorporation of both riboGTP and 5'-aminoTTP into the 
polynucleotide. From these results it can be concluded that GT restriction cleavage 
had occurred during the extension and/or workup procedures, most likely due to the 

25 synergized lability of the two modified nucleotides. 

In order to visualize all three expected restriction fragments, the same 
extension-cleavage experiment was performed in the presence of a- 3Z P-dCTP. As 
shown in Figure 22, three GT restriction fragments were observed with the expected 
relative mobility and specific radioactivity. 

30 The versatility of this dinucleotide restriction approach is demonstrated by AT 

restriction of the same DNA. Specific AT restriction was observed by polyacrylamide 
gel electrophoresis (PAGE) analysis (Figure 23). A similarly generated non- 
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radioactive product was analyzed by MALDI-TOF mass spectrometry (Figure 24). 
All the expected restriction fragments were observed except for a 2nt fragment 
which is lost during G25 column purification. 

The general applicability of this technology is further demonstrated when a 
5 longer, different DNA template was used (Figures 25 and 26). Primer extension with 
riboATP and 5'-aminoTTP followed by AT restriction generated expected 
oligonucleotides as observed by PAGE analysis (Figure 25) or MALDI-TOF mass 
analysis (Figure 26). 

Example 4. Genotvpinq by Complete Substitution/Complete Cleavage 

10 The following genotyping procedure by chemical restriction is an attractive 

alternative to other genotyping methods with many advantages including increased 
accuracy and speed. In general, this method involves PCR amplification of genomic 
DNA using chemically modified nucleotides followed by chemical cleavage at the 
modified bases with the resulting amplicons. Shown in Figure 27 is a schematic 

15 presentation of this technique. One of the primers (Primer 1) is designed to be close 
to the polymorphic site of interest so that one of the polymorphic bases (e.g., A) may 
be selected as the first cleavable nucleotide. After PCR amplification with the 
chemically modified nucleotide (supplemented with the other three natural 
nucleotides), only one of the two alleles would be cleavable at the polymorphic site. 

20 Treatment with chemical reagents would afford cleavage products comprising Primer 
1 , whose length can reveal the genotype of the sample. Analysis by either mass 
spectrometry or electrophoresis can be implemented for identifying the expected 
length difference. Furthermore, mass spectrometry analysis may unmask the single 
base difference on the complementary strand of DNA that contains the 

25 polymorphism, providing a built-in redundancy and higher accuracy. 

Illustrated in Figure 28 to 31 are the chemical cleavage and analysis 
procedures utilized to genotype transferrin receptor (TR) gene. A 82bp DNA 
sequence of TR gene was selected based on the location of polymorphism and 
efficiency of amplification (Figure 28). The polymorphic base (A or G) is positioned 3 

30 bases from the 3' end of Primer 1 . For A allele it is the first modified nucleotide to be 
incorporated; for G allele, the first cleavable base is 6 bases from the primer. As a 
result, fragments of different lengths are produced from chemical cleavage. The 
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PCR amplification reactions (50 \i\ each) were carried out in standard buffer with 
polymerase AmpliTaq Gold (0.1 unit/jal Cycler (MJ Research PTC-200) using 35 
cycles of amplification (1 min denaturation, 1.5 min annealing, and 5 min extension). 
Analysis of the PCR products on a 5% non-denaturing polyacrylamide gel (stained 
5 with Stains-All from Sigma) showed that 7-deaza-7-nitro-dATP can replace dATP for 
efficient PCR amplification (Figure 28). 

To the PCR products from 7-deaza-7-nitro-dATP were directly added 
piperidine, tris-(2-carboxylethyl)phosphine (TCEP), and Tris base to a final 
concentration of 1 M, 0.2 M, and 0.5 M, respectively, in a total volume of 100 I. After 

10 incubation at 95° C for 1 hour, 1 ml of 0.2 M triethylammonium acetate (TEAA) was 
added to each reaction mixture and the resulting solution purified on an OASIS 
column (Waters). The eluted products were concentrated to dryness on Speedvac 
and the residue analyzed by mass spectrometry or electrophoresis. Figure 29 
shows the sequences of selected fragments expected from cleavage at 7-deaza-7- 

15 nitro-dA. The sequences are grouped according to lengths and molecular weights. 
The first group contains longer fragments that are extended from primers. The 22nt 
is an invariant fragment which may be used as an internal reference. The 25nt or 
28nt fragment is expected from A or G allele, respectively. The shaded group of 
sequences are from the complementary strand of DNA, including invariant 13nt and 

20 1 1 nt fragments that can be used as internal references and a pair of 1 1 nt fragments 
expected from two allelic forms of TR gene with a 15 Da mass difference. Shown in 
figure 30(a) is a MALDI-TOF spectrum of chemically cleaved products from a 82bp 
heterozygote TR DNA sample. Highlighted in the spectrum are the two regions that 
contain fragments depicted in Figure 29. 

25 Each purified cleavage sample was mixed with 3-hydroxypicolinic acid and 

subjected to MALDI-TOF analysis on a Perceptive Biosystems Voyager-DE mass 
spectrometer. Mass spectra in the region of 7000-9200 daltons were recorded and 
the results for the three TR genotypes are shown in Figure 30 (b). The spectra were 
aligned using the peak representing invariant 22nt fragment (7189 Da). Two 

30 additional peaks were observed for AG heterozygote sample with one corresponding 
to A allele (8057 Da) and the other G allele (9005 Da). As expected, only one 
additional peak was observed for GG or AA homozygote samples, each with the 



WO 00/18967 



PCT/US99/22988 



182 



molecular weight of cleavage fragments from G or A allele. Figure 31 (a) shows a 
mass spectrum of AG heterozygote sample in the region of 3700-4600 Da. With 
3807 Da and 4441 Da fragments as internal references, the genotype of this sample 
was confirmed through the observation of two peaks in the middle of the spectrum 
5 with 15 Da mass difference. The molecular weights observed by mass spectrometry 
indicated that phosphate-deoxyribose-TCEP adducts were uniformly formed during 
the cleavage reaction, resulting in fragments that are modified at 3' end (Figure 31 
(b)). The data shown in Figure 30 and Figure 31 also illustrated that the combination 
of chemical restriction with mass spectrometry can provide corroborating genotyping 

10 information from both strands of DNA, thereby assuring the accuracy of the analysis. 
Alternatively, the chemically restricted samples may be analyzed by 
electrophoresis to detect the diagnostic length difference resulting from the two 
alleles. Capillary electrophoresis (CE) analyses were performed using a homemade 
instrument with a UV detector and a capillary containing denaturing linear 

15 polyacrylamide gel. Figure 32 (a) shows the CE chromatogram obtained from TR 
samples of various genotypes. As predicted, each genotype showed distinguished 
elution pattern corresponding with the lengths of expected cleavage products. 
Whereas AA homozygote produced a 25nt fragment and GG homozygote generated 
a 28nt fragment, AG heterozygote sample afforded both 25nt and 28nt products. 

20 After being labeled at 5' end by 32 P, the cleavage samples were subjected to PAGE 
analysis. The resulting autoradiogram in Figure 32 (b) demonstrates that the 
cleavage is specific with little or no background and the genotyping results are 
unambiguous. 

Another alternative detection method involves the application of fluorescence 
25 resonance energy transfer (FRET). FRET has been successfully applied for 

polymorphism detection by TaqMan assays (Todd J.A. et al. 1995, Nature Genetics, 
3:341-342) and Molecular Beacons (Tyagi, S. et al. 1998, Nature Biotechnology, 
16:49-53). However, when longer probes are necessary to achieve their 
hybridization to target sequences (e.g., AT rich sequences), it becomes increasingly 
30 difficult to distinguish the vanishingly small difference resulted from a single 

nucleotide mismatch. The advantage of chemical restriction in this regard is 
illustrated in Figure 33. Similar to the aforementioned example, a modified 
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nucleotide analog of one of the polymorphic base (e.g., A) is used in place of its 
natural counterpart in the PCR amplification. Primer 1 is designed to be close to the 
polymorphic site so that the polymorphic base A would be the first cleavable 
nucleotide for A allele. Primer 1 is also labeled with a fluorescent group (F1 ) 
5 positioned close to 3' end (Figure 33 (a)). After amplification and chemical 

restriction, a probe covalently attached to another fluor F2 (shown in Figure 3 (b)) 
can be added and the FRET effect between the two fuorophores measured. 
Because one of alleles was cleaved closer to the 3' end of primer 1 than the other, 
the difference between them in hybridization is expected to be greater than a single 

10 nucleotide mismatch, and may be exploited to distinguish the two allelic targets. As 
depicted in Figure 33 (c), the experimental temperature can be attenuated so that 
only the longer fragment from G allele can hybridize with the probe, resulting in 
FRET. Since in this system a "NO FRET" result could be interpreted either as allele 
A or failed PCR amplification, it is necessary to measure the fluorescence of each 

15 sample at various temperatures to ensure the positive detection of the shorter 
fragment from allele A at a lower temperature. Alternatively, this positive detection 
may be achieved through the use of a hairpined probe depicted in Figure 33 (d). 
The probe has a 5' end tail that folds back to form a hairpin, in addition to a fluro F3 
at the 5' end. With the short cleavage fragment from A allele, the hairpin probe can 

20 form a bridging duplex as depicted, generating detectable FRET between F1 and 
F3. Only with the longer fragment from G allele can the inter-strand hybridization 
compete with the stability of the hairpin and result in loss of FRET between F1 and 
F3. 

Example 5. Complete Sequencing by Partial Substitution/Partial 
25 Cleavage 

Using the following procedure, it is entirely possible to sequence, in 
one set of sequencing reactions, a polynucleotide consisting of 10,000, 20,000 or 
even more bases by polymerization in the presence of modified nucleotides, 
enzymatic restriction of polymerization products, purification of restriction fragments 
30 and chemical degradation to produce sequence ladders from each fragment. The 
procedure is limited only by the size of the template and the processivity (the ability 
to continue the polymerization reaction) of the polymerase used to extend the 
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primer. Unlike a shotgun cloning library in which there is a normal distribution of 
sequence inserts requiring highly redundant sequencing, using the method describe 
herein results in each nucleotide being sampled once and only once. Repeating the 
procedure using a second or even a third restriction enzyme cocktail will provide the 
5 sequence information needed to reassemble the sequences determined from the 
initial restriction in the proper order to reconstruct the full length polynucleotide 
sequence while also supplying the redundancy necessary to ensure the accuracy of 
the results. In the description which follows a variety of options for carrying out each 
step are provided. As before, it is understood that other modifications to the 
10 procedure described will be readily apparent to those skilled in the art; such other 
modifications are within the scope of this invention. 

TABLE 6 

Primer Molecular Weight Mass Difference 

RFCC 6099.6 
15 RFCmut 6115.9 +16 

RFCmut 5786.7 -313.2 

a. Anneal primer and template 

The template used may be a small or a large insert cloning vector or 
20 an amplification product such as a PCR fragment; it may also be single- or double- 
stranded. For example, without limitation, the template may be a plasmid, 
phagemid, cosmid, P1, PAC, BAC or YAC clone. The template is ideally rendered 
linear before extension to ensure that all extension products terminate at the same 
place. This can be accomplished by restricting the template with a restriction 
25 endonuclease. For example, the templates may be prepared in a vector that has 
restriction sites for one or more rare cutters on either side of the cloning site so that 
a linear template can be routinely prepared by restriction using the rare cutter 
enzyme (i.e., an enzyme that cleaves, for example, a 7 or 8 nucleotide motif). Many 
plasmid vectors such as, without limitation, Bluescript (Stratagene, Inc.) have these 
30 features. A primer can be selected which will anneal to a sequence in the vector, for 
example, the M13 universal primer sequences. This allows the sequencing of a 
library of clones using only one or two primers (one from each side of the insert). 
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Alternative, a series of insert-specific primers may be used (at approximately 5-20 kb 
intervals) in a version of primer walking. 

b. Extend primer in presence of all four natural 
deoxvribonucleotides and a modified nucleotide corresponding to one of the 

5 natural nucleotides . 

The procedures discussed above are used to extend the primer over 
the entire length of the template using one of the modified nucleotides described 
above or any other modified nucleotide which is capable of imparting selective 
cleavage properties to the modified polynucleotide. In general, the ratio of modified 

10 nucleotide to its natural counterpart can vary over a considerable range from very 
little (approximately 1%) to complete (> 99 %) substitution. The controlling factor is 
the efficiency of the subsequent chemical cleavage reaction. The more efficient the 
cleavage reaction, the lower the level of incorporation can be. The goal is to have 
approximately one modified nucleotide per restriction fragment so that, after 

15 cleavage, each molecule in the reaction mixture contributes to the sequencing 
ladder. Figure 7 shows one such modified polynucleotide, a linearized, single- 
stranded M1 3 template extended to 87 nucleotides in the presence of the modified 
nucleotide, 5'-amino dTTP using the exo-minus Klenow fragment of E. coli DNA 
polymerase. Figure 9 shows a 7.2 Kb extension product, again produced from an 

20 M 13 template in the presence of 5'-amino-dTTP and dTTP at a molar ratio of 100:1 
(Panel A, extension product). 

c. Purify the full length primer extension product (optional) 

In order to eliminate prematurely terminated (i.e., less than full length) 
polymerase extension products, thereby assuring a homogeneous sequencing 

25 ladder on electrophoresis after cleavage, it may be desirable to purify the full length 
or substantially full length extension products. It is noted, however, that the 
purification of the restriction fragments after digestion (step f, below) achieves 
essentially the same goal and, in most instances, is likely to suffice. In any event, 
the elimination of short extension products can be accomplished by numerous 

30 procedures known in the art such as spun column chromatography or high 

performance liquid chromatography (HPLC). Figure 8 shows a purified full length 
extension product before (Panel A) and after (Panel B) chemical cleavage with acid. 
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d. Cleave the primer extension product with one or more restriction 
enzymes . 

As noted previously, the optimal size for DNA sequencing templates (in 
this case, of restriction products) is approximately 300 to about 800 nucleotides 
5 when gel electrophoresis is to be used for the creation of the sequencing ladder. 
Thus restriction endonucleases must be employed to reduce the full length 
extension product of 10 Kb or more to manageable size. Numerous such 
endonucleases are known in the art. For example, many four-base restriction 
endonucleases are known and these will generally yield restriction products in the 

10 desired range. Shorter restriction fragments; e.g., less than 300 nucleotides, can 
also be sequenced, but to make the most efficient use of gel runs, it is desirable to 
separate the restriction fragments into sets according to their length. The shorter 
fragments will then require relatively brief sequencing run times while the longer 
fragments will require a longer gel and/or longer run times. Two or more restriction 

15 endonuclease cocktails, each containing one or more restriction endonucleases and 
a compatible buffer, can be used to provide the overlapping sequence information 
necessary to re-assemble the complete sequence of the polynucleotide from the 
restriction fragments. Figure 9 shows an exemplary restriction endonuclease 
digestion of a primer/template complex extended in the presence of dTTP and the 

20 modified nucleotide 5'-amino dTTP. As can be seen in Figure 9, complete cleavage 
was obtained using the restriction endonuclease Msc I. Other MSC I restriction 
products are not seen because only the 5' end of the primer extension product was 
labelled with 32 P. 

e. Label the restriction endonuclease products . 

25 To visualize the DNA sequencing ladder generated by this method, it is 

necessary to label the restriction endonuclease products with a detectable label. 
Many such labels are known in the art; any of them may be used with the methods 
of this invention. Among these are, without limitation, radioactive labels and 
chemical fluorophors. For instance, 35 SdATP (Amersham Phamacia Biotech, Inc) or 

30 rhodamine-dUTP (Molecular Probes) can be incorporated at the primer extension 
step. Alternatively, the DNA can be labeled after restriction by modification of the 
restriction fragments ends by, without limitation, T4 polynucleotide kinase or filling 
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recessed ends with a DNA polymerase and a labeled nucleotide. Such end-labeling 
is well known in the art (see, for example, Ausubel, F. M., et al., Current Protocols in 
Molecular Biology . John Wiley & Sons, 1998). End labeling has the advantage of 
putting one molecule of label on each DNA fragment which will afford homogenous 
5 sequencing ladders. Labeling of the template strand is of no consequence since it 
will not be cleaved during the chemical cleavage reaction due to the absence of 
modified nucleotide in its sequence. Thus, no sequencing ladder will be produced 
for the template strand. 

f. Separate the labeled restriction endonuclease products . 
10 The restriction fragments must be separated prior to chemical 

cleavage. Numerous methods are known in the art for accomplishing this (see, for 
example, Ausubel, F. M., op., dt). A particularly useful technique is HPLC which is 
rapid, simple, effective and automatable. For example, Fig 10 shows the resolution 
obtained by HPLC on Hae III restricted PhiX174 DNA. Ion reverse pair phase HPLC 
1 5 and ion exchange HPLC are two preferred methods of separation. 

g. Cleave the separated labeled restriction endonuclease 
fragments at sites of modified nucleotide incorporation . 

Depending on the modified nucleotide incorporated, use one of the 
cleavage reactions previously described herein or any other cleavage reaction which 
20 will selectively cleave at the site of incorporation of the modified nucleotide, such 
other cleavage reactions being within the scope of this invention. 

h. Determine the sequence of the fragment . 

Figure 1 1 shows the sequence ladder obtained from a polynucleotide 
in which T has been replaced with 5-amino T. This ladder, of course, only reveals 

25 where T occurs in the complete sequence of the target polynucleotide. To obtain 
the entire sequence, the above procedure would be repeated three more times, in 
each case one of the remaining nucleotides, A, C and G would be replaced with a 
corresponding modified nucleotide; e.g., 5'-amino-dATP, 5'-amino-dCTP or 5'-amino- 
dGTP. When all four individual fragment ladders are in hand, the complete 

30 sequence of the polynucleotide can easily be re-constructed by analysis and 
comparison of gel sequencing data. 
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Example 6. Complete sequencing by substantially complete 
substitution/substantially complete cleavage combined with mass 
spectrometry . 

The preceding procedure for complete sequencing of a polynucleotide still 
5 requires the use of gel electrophoresis for creating fragment ladders from which the 
sequence is read. As noted previously, gel electrophoresis is a time and labor 
intensive process which also requires a fair degree of skill to carry out in such a 
manner as to have a reasonable assurance of reproducible and accurate results. It 
is an aspect of this invention that the use of gel electrophoresis can be eliminated 

10 completely and replaced with relatively simple to use, fast, sensitive, accurate, 
automated mass spectrometry. The basis for this aspect of this invention is the 
previously discussed uniqueness in the molecular weights of virtually all 2-mers 
through 14-mers with the exception of the 8 fragment pairs described above (and 
other fragment pairs that are based on addition of identical sets of nucleotides to the 

15 8 fragment pairs. The following is an example of how this procedure would be 
carried out. While the example is described in terms of human intervention and 
specific analyses at each step, it will be readily apparent to those skilled in the art 
that a computer program could be devised to completely automate the analytic 
procedure and further increase the speed of this aspect of this invention. The use of 

20 such a computer program is, therefore, within the scope of this invention. 

The procedure for determining complete nucleotide sequences by mass 
spectroscopy would entail the following steps: 

a. substantially complete replacement of a natural nucleotide in a 
polynucleotide with a modified nucleotide to form a modified polynucleotide. This 

25 would be accomplished by an amplification procedure or by primer extension 
employing the polymerase reaction discussed above. Optionally, the procedure 
disclosed above could be used to arrive at the optimal polymerase or set of 
polymerases for preparing the desired modified polynucleotide; 

b. cleavage of the modified polynucleotide under conditions that 
30 favor substantially complete cleavage at and essentially only at the points of 

incorporation of the modified nucleotide in the modified polynucleotide; and, 
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c. determination of the masses of the fragments obtained in the 
preceding cleavage reaction. 

The above three steps are then repeated three more times, each time a 
different modified nucleotide corresponding to each of the remaining natural 
5 nucleotides is used. The result will be a series of masses from which all or most of the 
sequence of the entire original polynucleotide can be ascertained. Any sequence 
ambiguity which remains after the main analysis is done should be readily resolved by 
using one more reactions involving a contiguous dinucleotide substitution/cleavage 
reaction or by a conventional DNA sequencing procedure. The following is an 
10 example of how the analysis of a fragment would proceed. 

Given the following 20 nucleotide natural oligomer extended from a 16mer 

primer: 

5'-primer-TTACTGCATCGATATTAGTC-3' 
polymerization in the presence of dTTP, dCTP, dGTP and a modified dATP will 

15 result, after substantially complete cleavage, in five fragments whose masses are 
shown in Table 7. Carrying out the procedure three more times for the remaining 
three natural nucleotides will result in three more sets of fragments, the masses of 
which are also shown in Table 7. From these masses, the nucleotide content (but 
not sequence, yet) of all the fragments can be uniquely determined. The actual 

20 sequence is determined by analyzing all four cleavage results together. 

For example, looking at the masses of all the fragments in Table 1 , it is 
readily discernable that only one mass in each cleavage set comprises more than 16 
nucleotides, that all the other fragments are 3' of the primer (since the fragment 
containing the primer must be at least 16 nt) and that there are two nucleotides after 

25 the 

primer in the A cleavage column, three in the C column, five in the G column and 
none in the T column. Therefore, the sequence must begin with TT followed by an 
A, then a C, an unknown nucleotide and then a G. The sequence must start with 2 
T residues because neither A, C nor G cleavage occurs in this initial interval. Also, 
30 by adding the masses of the fragments in the different cleavage sets, it can be seen 
that the length the unsequenced region is 20 nucletotides. The number of 
nucleotides in of the four cleavage sets are also readily ascertainable - set A: (primer 
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+2) + 5 + 4 + 3 + 2 = 16; set C: (primer + 3) + 10 + 3 + 3 + 1 = 20; set G: (primer + 5) 
+ 7 + 5 + 3 = 20; set T: 4 + 3 + 3 + 2 + 2+1=15. From this information it is clear 
that there must be overlapping fragments in the A and T sets. 

Subtracting the known mass of the primer from those fragments containing 
5 the primer reveals the nucleotide content of the sequence immediately following the 
primer. Thus, in lane A, the residual mass of 608 Daltons which, from Table 3, is 
seen to correspond to TT which therefore must be the first two nucleotides in the 
unknown fragment sequence. The sequence following the primer is thus already 
known to be TTAC_G. From the mass of the 5mer in the G lane (1514 Daltons), it 
10 can be seen that the 5-mer contains three Ts, an A and a C. Thus, the missing 
nucleotide must be a T; the leading sequence is TTACTG. 
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TABLE 7 



5' - primer-TTACTGCATCGATATTAGTC - 3' 


g 1 pQl/ll of 

modified: 


A 


Mass 


c 


Mass 


G 


Mass 


T 


Mass 




primer-TT 


608+ 
primer 


primer-TTA 


921+ 
primer 


primer-TTACT 


1514+ 
primer 


primer 


Primer 
only 


Cleavage 


ACTGC 


1463 


CTG 


861 


GCATC 


1463 


T 


304 


fragments 


ATCG 


1174 


CAT 


845 


GATATTA 


2119 


TAC 


845 


listed in 


AT 


556 


CGATATTAGT 


3041 


GTC 


861 


TGCA 


1174 


5'-3 


ATT 


860 


C 


289 






TCGA 


1174 


order 


AGTC 


1174 










TA 


556 
















T 


304 
















TAG 


885 
















TC 


532 



Table 7: Nucleotide-specific cleavage patterns for the sequence shown at top, which consists of 
a primer of known sequence and length (not specified) followed by 20 nucleotides of 'unknown' 
sequence for the purposes of this example. Cleavages in this example occur via a mechanism that 
breaks the phosphodiester bond 5' of the modified nucleotide. Each cleavage set includes one 
fragment containing the primer plus however many nucleotides after the primer until the first 
occurance of the modified nucleotide. The known mass of the primer can be subtracted from this 
(largest) mass to obtain the difference, which gives the mass and therefore the nucleotide content 
of the sequence immediately 3' of the primer. The masses provided in the table reflect the 
presence of one external phosphate group in each cleavage mass, however it should be 
recognized that, depending on the chemical nature of the nucleotide modification and the 
cleavage reaction, actual masses will likely differ from those shown in the Table. However, such 
differences are expected to be systematic and therefore do not invalidate the thrust of the analysis 
presented. 
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Turning now to the masses shown in the T lane of Table 7, the 906 Dalton 
mass must contain a T, an A and a C. Since there already is a TAC sequence 
known, it may tentatively be held that this is a confirming sequence, part of the 
overlap of the A and T cleavages. It, of course, cannot yet be ruled out that another 
5 3-mer containing T, A and C exist in the fragment which is why this assignment must 
remain tentative at this point. 

The next T cleavage fragment must, at a minimum, contain a T and a G. Two T 
cleavage masses permit this: 946 and 1235. Thus, the additional sequence must be 
either G followed by T (if the 946 mass is the next mass) or G followed by a C and an A, 

10 order not known, and then T. The sequence is now known to be either TTACTGGT or 
TTACTG(C,A)T (the parentheses and comma between nucleotides will be used to 
indicate unknown order). 

Going back to the A cleavage reaction, it can be seen that the next cleavage 
mass after the TT must contain ACTG. Two masses, 1235 Da and 1524 Da, meet this 

15 criterion. If 1235 Da is correct, the seventh nucleotide in the sequence is A because 
cleavage has to have occurred at that nucleotide. If 1524 Da is correct, then the 
sequence is CA. CA is consistent with one of the two possibilities discussed above; 
thus the overall sequence so far is TTACTGCAT. 

Looking next at the masses from the C cleavage reaction, it can be seen that the 

20 first mass after the initial TTA must be CTG(C.A). Since cleavage will occur 5' of any C, 
the possibilities are CTG or CTGA; only the first of these is supported by the masses in 
the C lane. Thus the second mass fragment in the C lane must be CTG followed by 
another C (because cleavage has occurred at that point). The third mass in the C lane 
(906 Da) must contain a C, an A and a T which confirms the previous sequence of CAT. 

25 This leaves only two possibilities for the remaining sequences, a C followed by the 10mer 
or the 10mer followed by a terminal C. However, if the former were the case, then a 
cleavage fragment from one of the other lanes, A, G, or T, should show a 3mer, 4mer or 
5mer which contains 2 Cs. Since none of the masses permit such an oligomer, the lone 
C must be at the 3' end of the unknown fragment and the 10mer is next after CAT giving 

30 the following sequence TTACTGCATC C. 

Turning once again to the G cleavages, it is now known that a fragment must exist 
which contains at least GCATC. From the masses available this may be GCATC itself 
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(1524 Da) or the 7mer (2180 Da). However, if the mass of the 5mer is substracted from 
the mass of the 7mer, the remaining mass, 656 Da, does not correspond to any known 
oligonucleotide. Thus, the 7mer cannot be next, GCATC is the correct sequence and the 
next nucleotide must be a G (since cleavage has occurred to give the 5mer). The 

5 sequence is now TTACTGCATCG C. 

The next mass in the T cleavage series must begin with TCG. The only T 
cleavage mass which permits such a combination is 1235 Da which corresponds to a 
TCGA sequence. This sequence must be followed by a T since cleavage has occurred at 
that point. The overall sequence is, therefore, TTACTGCATCGAT C. 

10 There is only one mass among the available T cleavage series which contains a C, 

the 593 Da TC. Thus the nucleotide preceding the terminal C must be a T. Likewise, the 
only TC-containing mass in the A cleavage series that does not contain 2 Cs, which is 
now known to be not permissible, is 1235 or (A,G)TC. The 1235 mass has already been 
used once (nucleotides 8 - 1 1 ) but it is also known that there is fragment overlap since 

15 the A series only accounts for a total of 16 nucleotides. The sequence is now known to 

be TTACTGCATCGAT (A.G)TC. However, if the terminal sequence is ATC, there 

should be a 906 Da mass among the A cleavages; there is not. On the other hand, if the 
terminal sequence is GTC, a mass of 922 Da should be found among the G cleavage 
fragments and there is. Thus, the sequence can now be established as 

20 TTACTGCATCGAT AGTC. 

There is only one available T cleavage mass containing AG but no C, the 946 Da 
mass consisting of T(A,G). This mass must account for the AG in positions 17 and 18. 
Therefore, position 16 must be a T; the sequence is now known to be TTACTGCATCGAT 
TAGCT. 

25 Only two masses are still available in the A cleavage group, 617 (AT) and 921 

(ATT). These complete the overall sequence in two ways, AT ATT or ATTAT. None of 
the masses permits the resolution of this ambiguity. However, all 20 nucleotides in the 
target oligonucleotide have, in a single experiment, been unambiguously identified and 1 8 
of the 20 have been unambiguously sequenced. 

30 With regard to ambiguity generally, be it be one, as in the above example, or more 

than one, as might be the case when sequencing longer fragments, depending on the 
nature of the ambiguity and the environment it which it exists; i.e., the nucleotides on 



WO 00/18967 



194 



PCT/US99/22988 



either side of it, an additional experiment using any one of several available procedures 
should readily resolve the matter. For instance, an experiment using the dinucleotide 
cleavage method of this invention might provide the additional information necessary to 
resolve the ambiguity. Alternatively, some relaxation of the substantially complete 
5 cleavage conditions might result in a ladder of masses in which a known mass is joined 
with an adjacent ambiguous mass in a manner that clarifies the position and order of the 
ambiguous mass with respect to the known mass. Or, low accuracy, single pass Sanger 
sequencing might be employed. Alone, this relatively easy and rapid version of Sanger 
sequencing would not provide much valuable information but, as a complement to the 
10 method of this invention, it would likely provide sufficient information to resolve the 

ambiguity (and, to the extent the sequencing ladder obtained is unambiguously readable 
it would provide a partial redundancy verifying the mass spec data. 

CONCLUSION 

15 Thus, it will be appreciated that the method of the present invention provides 

versatile tools for the detection of variance in polynucleotides, for the determination 
of complete nucleotide sequences in polynucleotides and for genotyping of DNA. 

Although certain embodiments and examples have been used to describe the 
present invention, it will be apparent to those skilled in the art that changes in the 
20 embodiments and examples shown may be made without departing from the scope 
of this invention. 

Other embodiments are within the following claims. 
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CLAIMS 

WHAT IS CLAIMED: 

1 . A method for cleaving a polynucleotide, comprising: 

5 a. replacing one or more natural nucleotides at substantially each 

point of occurrence in a polynucleotide with modified nucleotides to form a modified 
polynucleotide provided that, when only one natural nucleotide is being replaced, the 
modified nucleotide is not a ribonucleotide or a nucleoside a-thiotriphosphate; 

b. contacting said modified polynucleotide with a reagent or 

10 reagents that cleaves the modified polynucleotide at substantially each said point of 
occurrence of said one or more modified nucleotides. 

2. The method of claim 1 , whereby variance in nucleotide sequence in 
related polynucleotides is detected, further comprising: 

15 c. determining the masses of said fragments obtained from step b; 

and, 

d. comparing the masses of said fragments with the masses of 
fragments expected from cleavage of a related polynucleotide of known sequence, 
or 

20 e. repeating steps a - c with one or more related polynucleotides 

of unknown sequence and comparing the masses of said fragments of said 
polynucleotide with the masses of fragments obtained from the related 
polynucleotides. 

25 3. The method of claim 1 , whereby the nucleotide sequence of said 

polynucleotide is determined, comprising: 

c. determining the masses of said fragments obtained from step b; 

d. repeating steps a, b and c, each time replacing a different 
natural nucleotide in said polynucleotide with a modified nucleotide until each natural 

30 nucleotide in said polynucleotide has been replaced with a modified polynucleotide, 
each modified polynucleotide has been cleaved and the masses of the cleavage 
fragments have been determined; and, 
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e. constructing said nucleotide sequence of said polynucleotide 
from said masses of said first fragments. 

4. The method of claim 1 , whereby a polynucleotide known to contain a 
5 polymorphism or mutation is genotyped, comprising: 

c. using as said natural nucleotide to be replaced, a nucleotide 
known to be involved in said polymorphism or mutation; 

d. replacing said natural nucleotide at substantially each point of 
occurrence by amplifying said polynucleotide using a modified nucleotide to form a 

10 modified polynucleotide; 

e. cleaving said modified polynucleotide into fragments at 
substantially each point of occurrence of said modified nucleotide; 

f. analyzing said fragments to determine genotype. 

15 5. The method of claim 4, wherein said analysis of said fragments 

comprises using electrophoresis, mass spectrometry or FRET detection. 

6. The method of claim 1 , comprising: 

a. replacing a first natural nucleotide at substantially each point of 
20 occurrence in a polynucleotide with a modified nucleotide to form a once modified 

polynucleotide; 

b. replacing a second natural nucleotide at substantially each point 
of occurrence in said once modified nucleotide with a second modified nucleotide to 
form a twice modified nucleotide; and, 

25 c. contacting said twice modified polynucleotide with a reagent or 

reagents which cleave the twice modified polynucleotide at each point in said twice 
modified polynucleotide where said first modified nucleotide is followed immediately 
in sequence by said second modified nucleotide. 

30 7. The method of claim 6, whereby variance in nucleotide sequence of 

related polynucleotides is detected, comprising: 

d. determining the masses of said fragments obtained from step c; 
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e. comparing the masses of said fragments with the masses of 
fragments expected from cleavage of a related polynucleotide of known sequence, 
or 

f. repeating steps a - d with one or more related polynucleotides 
5 of unknown sequence and comparing the masses of said fragments with masses of 

fragments obtained from cleavage of the related polynucleotides. 

8. The method of claim 1 , wherein variance in nucleotide sequence in 
related polynucleotides is detected, comprising: 

10 a. replacing three of four natural nucleotides at substantially each 

point of occurrence in a polynucleotide with three stabilizing modified nucleotides to 
form a modified polynucleotide having one remaining natural nucleotide; 

b. cleaving said modified polynucleotide into fragments at 
substantially each point of occurrence of said one remaining natural nucleotide; 

15 c. determining the masses of said fragments; and, 

d. comparing the masses of said fragments with the masses of 
fragments expected from cleavage of a related polynucleotide of known sequence, or 

e. repeating steps a - c with one or more related polynucleotides 
of unknown sequence and comparing the masses of said fragments with masses 

20 obtained from cleavage of the related polynucleotides. 

9. The method of claim 8, further comprising replacing said one remaining 
natural nucleotide with a destabilizing modified nucleotide. 

25 1 0. The method of claim 1 , wherein variance in nucleotide sequence in 

related polynucleotides is detected, comprising: 

a. replacing two or more natural nucleotides at substantially each 
point of occurrence in a polynucleotide with two or more modified nucleotides 
wherein each said modified nucleotide has a different cleaving characteristic from 

30 each other of said modified nucleotides, to form a modified polynucleotide; 
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b. cleaving said modified polynucleotide into first fragments at 
substantially each point of occurrence of a first of said two or more modified 
nucleotides; 

c. cleaving said first fragments into second fragments at each 

5 point of occurrence of a second of said two or more modified nucleotides in said first 
fragments; 

d. determining the masses of said first fragments and said second 
fragments; and, 

e. comparing the masses of said first fragments and said second 
1 0 fragments with the masses of first fragments and second fragments expected from 

the cleavage of a related polynucleotide of known sequence, or 

f. repeating steps a - d with one or more related polynucleotides of 
unknown sequence and comparing the masses of said first and second fragments 
with masses obtained from the cleavage of the related polynucleotides. 

15 

1 1 . The method of claim 1 0, wherein the steps are repeated using a 
modified polynucleotide obtained by replacing different pairs of natural nucleotides 
with modified nucleotides; that is, replacing said first and a third, said second and a 
fourth, said first and said fourth, said second and said third or said third and said 

20 fourth natural nucleotides with modified nucleotides. 

12. The method of claim 10, wherein said cleavage comprises using a 
mass spectrometer. 

25 13. The method of claim 12, wherein said mass spectrometer is a tandem 

mass spectrometer. 

14. A method for determining nucleotide sequence in a polynucleotide, 
comprising: 

30 a. replacing a natural nucleotide at a percentage of points of 

occurrence in a polynucleotide with a modified nucleotide to form a modified 
polynucleotide wherein said modified polynucleotide is not a ribonucleotide; 
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b. cleaving said modified polynucleotide into fragments at 
substantially each point of occurrence of said modified nucleotide; 

c. repeating steps a and b, each time replacing a different natural 
nucleotide in said polynucleotide with a modified nucleotide; and, 

5 d. determining the masses of said fragments obtained from each 

cleavage reactions; and. 

e. constructing said sequence of said polynucleotide from said 

masses, or 

f. analyzing a sequence ladder obtained from the fragments in step c. 

10 

15. A method for determining nucleotide sequence in a polynucleotide, 
comprising: 

a. replacing a natural nucleotide at a first percentage of points of 
occurrence in a polynucleotide with a modified nucleotide to form a modified 

1 5 polynucleotide wherein said modified nucleotide is not a ribonucleotide or a 
nucleoside a-thiotriphosphate; 

b. cleaving said modified polynucleotide into fragments at a 
second percentage of said points of occurrence of said modified nucleotide such 
that the combination of said first percentage and said second percentage results in 

20 partial cleavage; 

c. repeating steps a and b, each time replacing a different natural 
nucleotide in said polynucleotide with a modified nucleotide; 

d. determining the masses of said fragments obtained from each 
cleavage reaction; and, 

25 e. constructing said sequence of said polynucleotide from said 

masses or, 

f. analyzing a sequence ladder obtained from said fragments in 
steps a and b. 

30 16. The method of claim 1 , wherein nucleotide sequence in a 

polynucleotide, is detected, comprising: 
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a. replacing two or more natural nucleotides at substantially each 
point of occurrence in a polynucleotide with two or more modified nucleotides to form 
a modified polynucleotide; 

b. separating said modified polynucleotide into two or more 
5 aliquots, the number of said aliquots being the same as the number of natural 

nucleotides replaced in step a; and, 

c. cleaving said modified polynucleotide in each said aliquot into 
fragments at substantially each point of occurrence of a different one of said 
modified nucleotides such that each of said aliquots contains fragments from 

10 cleavage at a different modified nucleotide than each other said aliquot; 

d. determining masses of said fragments; and, 

e. constructing said nucleotide sequence from said masses; or, 

f. cleaving said modified polynucleotide in each said aliquot into 
fragments at a percentage of points of occurrence of a different modified nucleotide 

15 such that each of said aliquots contains fragments from cleavage at a different 
modified nucleotide than each other said aliquot; and, 

g. analyzing a sequence ladder obtained from said fragments in step f. 

17. A method for determining nucleotide sequence in a polynucleotide, 
20 comprising: 

a. replacing a first natural nucleotide at a percentage of points of 

incorporation in a polynucleotide with a first modified nucleotide, wherein said first 

modified nucleotide is not a ribonucleotide or a nucleoside a-thiotriphosphate, to 

form a first partially modified polynucleotide 
25 b. cleaving said first partially modified nucleotide into fragments 

using said cleaving procedure of known cleavage efficiency to form a first set of 

nucleotide specific cleavage products; 

c. repeating steps a and b replacing a second, a third and a fourth 

natural nucleotide with a second, third and fourth modified nucleotide to form a 
30 second, third and fourth partially modified polynucleotide which, upon cleavage, 

afford a second, third and fourth set of nucleotide specific cleavage products; 
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d. performing gel electrophoresis on said first, second, third and 
fourth set of nucleotide specific cleavage products to form a sequence ladder; and, 

e. reading said sequence of said polynucleotide from said 
sequence ladder. 

5 

18. A method for cleaving a polynucleotide during polymerization, 
comprising: 

mixing together four different nucleotides, one or two of which are modified 
nucleotides; and, 

1 0 two or more polymerases, at least one of which produces or enhances 

cleavage at points where said modified nucleotide is being incorporated or, if two 
modified nucleotides are used, at points wherein one said modified nucleotide is 
followed immediately in sequenced by the other said modified nucleotide. 

15 19. The method of claim 18, wherein two modified nucleotides are used, 

one being a ribonucleotide and one being a S'-amino^'.S'-dideoxynucleotide. 

20. The method of claim 19, wherein two polymerases are used, one being 
Klenow (exo-) polymerase and one being mutant E710A Klenow (exo-) polymerase. 

20 

21. The method of any of claims 1, 6, 8, 10, 14, 15, 16, 17 or 18, wherein 
natural nucleotides not being replaced with modified nucleotides are replaced with 
mass-modified nucleotides. 

25 22. The method of any of claims 1, 6, 8, 10, 14, 15, 16, 17 or 18, wherein 

said polynucleotide is selected from the group consisting of DNA and RNA. 



30 



23. The method of any of claims 1 , 6, 8, 10, 14, 15, 16, 1 7 or 1 8, wherein 
said detection of said masses of said fragments comprises using mass 
spectrometry. 
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24. The method of claim 23 wherein said mass spectrometry is 
electrospray ionization mass spectrometry. 

25. The method of claim 23 wherein said mass spectrometry is matrix 
5 assisted desorption/ionization mass spectrometry (MALDI). 

26. The method of claim 14, 15, or 16 wherein analyzing a sequence 
ladder comprises gel electrophoresis. 

10 27. The method of claim 17, further comprising: 

c. cleaving said first, second, third and fourth partially modified 
polynucleotide obtained in step a with one or more restriction enzymes to form 
restriction fragments; 

d. labeling the ends of said restriction fragments; and, 

15 e. purifying said labeled restriction fragments prior to performing 

step (b) of claim 17. 

28. A method for cleaving a polynucleotide such that substantially all 
fragments obtained from the cleavage carry a label, comprising: 

20 a. replacing a natural nucleotide partially or at substantially each 

point of occurrence in a polynucleotide with a modified nucleotide to form a modified 
polynucleotide; 

b. contacting, in the presence of a phosphine covalently bonded to 
a label, said modified polynucleotide with a reagent or reagents which cleave(s) the 
25 modified polynucleotide partially or at substantially each said point of occurrence. 

29. The method of claim 28, wherein said phosphine is tris(carboxyethyl) 
phosphine. 

30 30. The method of claim 28, wherein said label is selected from the group 

consisting of a fluorescent tag and a radioactive tag. 
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31 . A method for detecting a variance in nucleotide sequence in a 
polynucleotide, for sequencing a polynucleotide or for genotyping a polynucleotide 
known to contain a polymorphism or mutation comprising: 

a. replacing one or more natural nucleotides in said polynucleotide 
5 with one or more modified nucleotides wherein each modified nucleotide is modified 
with one or more modifications selected from the group consisting of a modified 
base, a modified sugar and a modified phosphate ester, provided that, if only one 
natural nucleotide is being replaced, said modified nucleotide is not a ribonucleotide 
or a nucleoside a-thiotriphosphate; 
10 b. contacting said modified polynucleotide with a reagent or 

reagents which cleave the modified polynucleotide into fragments at site(s) of 
incorporation of said modified nucleotide; 

c. analyzing said fragments to detect said variance, to construct 
said sequence or to genotype said polynucleotide. 

15 

32. The method of claim 31 , wherein, said modified nucleotide comprises a 
modified base. 

33. The method of claim 32, wherein said modified base comprises 
20 modified adenine. 

34. The method of claim 33, wherein said modified adenine is 7-deaza-7- 
nitroadenine. 

25 35. The method of claim 34, wherein cleaving said modified polynulceotide 

into fragments comprises contacting said modified polynucleotide with chemical 
base. 

36. The method of claim 34, wherein cleaving said modified polynucleotide 
30 into fragments comprises contacting said modified polynucleotide with a phosphine. 
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37. The method of claim 36, wherein contacting said modified 
polynucleotide with a phosphine comprises contacting said modified polynucleotide 
with tris(2-carboxyethyl)phosphine. 

5 38. The method of claim 32 wherein said modified base comprises 

modified cytosine. 

39. The method of claim 38, wherein said modified cytosine comprises 
azacytosine. 

10 

40. The method of claim 38, wherein said modified cytosine is a cytosine 
substituted at the 5-position with an electron withdrawing group. 

41 . The method of claim 40, wherein said electron withdrawing group is 
1 5 selected from the group consisting of nitro and halo. 

42. The method of claim 39, wherein cleaving said modified polynucleotide 
into fragments comprises contacting said modified polynucleotide with chemical 
base. 

20 

43. The method of claim 42, wherein cleaving said modified polynulceotide 
into fragments comprises contacting said modified polynucleotide with tris(2- 
carboxyethyl)phosphine. 

25 44. The method of claim 32, wherein said modified base comprises 

modified guanine. 



30 



45. 
guanine. 



The method of claim 44, wherein said modified guanine is 7-methyl- 
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46. The method of claim 45, wherein cleaving said modified polynucleotide 
into fragments comprises contacting said modified polynucleotide with chemical 
base. 

5 47. The method of claim 44, wherein said modified guanine is N 2 - 

allylguanine. 

48. The method of claim 47, wherein cleaving said modified polynucleotide 
into fragments comprises contacting said modified polynucleotide with an 

10 electrophile. 

49. The method of claim 48, wherein said electrophile is iodine. 

50. The method of claim 32, wherein said modified base is selected from 
15 the group consisting of modified thymine and modified uracil. 

51 . The method of claim 50, wherein said modified thymine or said 
modified uracil is 5-hydroxyuracil. 

20 52. The method of claim 51 , wherein cleaving said modified 

polynucleotide into fragments comprises: 

a. contacting said polynucleotide with a chemical oxidant; and, then 

b. contacting said polynucleotide with chemical base. 

25 53. The method of claim 31 , wherein said modified nucleotide comprise a 

modified sugar provided that, when only one type of modified nucleotide is being 
used, it is not a ribonucleotide or a nucleoside a-thiophosphate. 



54. The method of claim 53, wherein said modified sugar comprises a 2- 
30 ketosugar. 
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55. The method of claim 54, wherein cleaving said modified polynucleotide 
into fragments comprises contacting said modified polynucleotide with chemical 
base. 

5 56. The method of claim 53, wherein said modified sugar comprises 

arabinose. 

57. The method of claim 56, wherein cleaving said modified polynucleotide 
into fragments comprises contacting said modified polynucleotide with chemical 

10 base. 

58. The method of claim 53, wherein said modified sugar comprises a 4- 
hydroxymethyl group. 

15 59. The method of claim 58, wherein cleaving said modified polynucleotide 

into fragments comprises contacting said modified polynucleotide with chemical 
base. 

60. The method of claim 53, wherein said modified sugar comprises 
20 hydroxycyclopentane. 

61 . The method of claim 60, wherein said hydroxycyclopentane comprises 
1 -hydroxy- or 2-hydroxycyclopentane. 

25 62. The method of claim 60, wherein cleaving said modified polynucleotide 

into fragments comprises contacting said modified polynucleotide with chemical 
base. 

63. The method of claim 53, wherein said modified sugar comprises an 
30 azidosugar. 
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64. The method of claim 63, wherein said azidosugar comprises 2'-azido, 
4'-azido or 4'-azidomethyl sugar. 

65. The method of claim 63, wherein cleaving said modified polynucleotide 
5 into fragments comprises contacting said polynucleotide with tris(2-carboxyethyl)- 

phosphine (TCEP). 

66. The method of claim 53, wherein said modified sugar comprises a 
group capable of photolyzing to form a free radical. 

10 

67. The method of claim 66, wherein said group capable of photolyzing to 
form a free radical is selected from the group consisting of phenylselenyl and t- 
butylcarboxy. 

15 68. The method of claim 66, wherein cleaving said modified polynucleotide 

into fragments comprises contacting said modified polynucleotide with ultraviolet 
light. 

69. The method of claim 53, wherein said modified sugar comprises a 
20 cyanosugar. 

70. The method of claim 69, wherein said cyanosugar is selected from the 
group consisting of 2'-cyanosugar and 2"-cyanosugar. 

25 71 . The method of claim 69, wherein cleaving said modified polynucleotide 

into fragments comprises contacting said modified polynucleotide with chemical 
base. 

72. The method of claim 53, wherein said modified sugar comprises an 
30 electron withdrawing group. 
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73. The method of claim 72, wherein said electron withdrawing group is 
selected from the group consisting of fluorine, azido, methoxy and nitro. 

74. The method of claim 73, wherein said electron withdrawing group is 
5 located at the 2', 2" or 4' position of the modified sugar. 

75. The method of claim 72, wherein cleaving said modified polynucleotide 
into fragments comprises contacting said modified polynucleotide with chemical 
base. 

10 

76. The method of claim 53, wherein said modified sugar comprises an 
electron-withdrawing element in the sugar ring. 

77. The method of claim 76, wherein said electron-withdrawing element 
15 comprise nitrogen. 

78. The method of claim 77, wherein said nitrogen replaces the ring 
oxygen of said modified sugar. 

20 79. The method of claim 77, wherein said nitrogen replaces a ring carbon 

of said modified sugar. 

80. The method of claim 78, wherein cleaving said modified polynucleotide 
into fragments comprises contacting said modified polynucleotide with chemical 

25 base. 

81 . The method of claim 79, wherein cleaving said modified polynucleotide 
into fragments comprises contacting said modified polynucleotide with chemical 
base. 

30 

82. The method of claim 53, wherein said modified sugar comprise a 
mercapto group. 
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83. The method of claim 82, wherein said mercapto group is located in the 
2' position of the sugar. 

5 84. The method of claim 82, wherein cleaving said modified polynucleotide 

into fragments comprises contacting said modified polynucleotide with chemical 
base. 

85. The method of claim 53, wherein said modified sugar is selected form 
10 the group consisting of a 5'-methylenyl-sugar, a 5'-keto-sugar and a 5',5'-difluoro- 

sugar. 

86. The method of claim 85, wherein cleaving said modified polynucleotide 
into fragments comprises contacting said modified polynucleotide with chemical 

15 base. 

87. The method of claim 31 , wherein said modified nucleotide comprises a 
modified phosphate ester provided, when only one type of modified nucleotide is 
used, it is not a nucleoside a-thiotriphosphate. 

20 

88. The method of claim 87, wherein said modified phosphate ester 
comprises a phosphorothioate. 

89. The method of claim 88, wherein the sulfur atom of said 
25 phosphorothioate is not covalently bonded to a sugar ring. 

90. The method of claim 89, wherein cleaving said modified polynucleotide 
into fragments comprises: 

a. contacting said sulfur of said phosphorothiolate with an 
30 alkylating agent; and, 

b. then contacting said modified polynucleotide with chemical 

base. 
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91 . The method of claim 90, wherein said alkylating agent is methyl iodide. 

92. The method of claim 89, wherein cleaving said modified polynucleotide 
5 into fragments comprises contacting said sulfur of said phosphorothioate with B- 

mercaptoethanol in a chemical base. 

93. The method of claim 92, wherein said chemical base comprises 
sodium methoxide in methanol. 

10 

94. The method of claim 88, wherein the sulfur atom of said 
phosphorothiolate is covalently bonded to a sugar ring. 

95. The method of claim 94, wherein cleaving said modified polynucleotide 
15 into fragments comprises contacting said modified polynucleotide with chemical 

base. 

96. The method of claim 87, wherein said modified phosphate ester 
comprises a phosphoramidate. 

20 

97. The method of claim 96, wherein cleaving said modified polynucleotide 
into fragments comprises contacting said modified polynucleotide with acid. 

98. The method of claim 87, wherein said modified phosphate ester 

25 comprises a group selected from the group consisting of alkyl phosphonate and alkyl 
phosphorotriester. 

99. The method of claim 98, wherein said alkyl is methyl. 

30 100. The method of claim 96, wherein cleaving said modified polynucleotide 

into fragments comprises contacting said modified polynucleotide with acid. 
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101 . The method of claim 31 , comprising replacing a first and a second 
natural nucleotide with a first and a second modified nucleotide such that said 
polynucleotide can be specifically cleaved at sites where said first modified 
nucleotide is followed immediately in sequence by said second modified nucleotide. 

5 

102. The method of claim 101, wherein: 

said first modified nucleotide is covalently bonded at its 5' position to a sulfur 
atom of a phosphorothioate group; and, 

said second modified nucleotide, which is modified with a 2'hydroxy group, is 
10 contiguous to, and 5' of, said first modified nucleotide. 

103. The method of claim 102, wherein cleaving said modified 
polynucleotide into fragments comprises contacting said modified polynucleotide 
with chemical base. 

15 

1 04. The method of claim 101, wherein: 

said first modified nucleotide is covalently bonded at its 3' position to a sulfur 
atom of a phosphorothioate group; and, 

said second modified nucleotide, which is modified with a 2'-hydroxy group, is 
20 contiguous to and 3' of said first modified nucleotide. 

1 05. The method of claim 1 04 wherein cleaving said modified 
polynucleotide into fragments comprises contacting said modified polynucleotide 
with chemical base. 

25 

106. The method of claim 101 , wherein: 

said first modified nucleotide is covalently bonded at its 5' position to a first 
oxygen atom of a phosphorothioate group; 

said second modified nucleotide is substituted at its 2' position with a leaving 
30 group; and, 

said second modified nucleotide is covalently bonded at its 3' position to a 
second oxygen of said phosphorothioate group. 
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107. The method of claim 106, wherein said leaving group is selected from 
the group consisting of fluorine, chlorine, bromine and iodine. 

5 108. The method of claim 106, wherein cleaving said modified 

polynucleotide into fragments comprises contacting said modified polynucleotide 
with chemical base. 

109. The method of claim 108, wherein said chemical base comprises 
10 sodium methoxide. 

110. The method of claim 101, wherein; 

said first modified nucleotide is covalently bonded at its 5' position to a first 
oxygen atom of a phosphorothioate group; 
15 said second modified nucleotide is substituted at its 4' position with a leaving 

group; and, 

said second modified nucleotide is covalently bonded at its 3' position to a 
second oxygen of said phosphorothioate group. 

20 111. The method of claim 1 1 0, wherein said leaving group is selected from 

the group consisting of fluorine, chlorine, bromine and iodine. 

112. The method of claim 110, wherein cleaving said modified 
polynucleotide into fragments comprises contacting said modified polynucleotide 

25 with chemical base. 

1 1 3. The method of claim 112, wherein said chemical base comprises 
sodium methoxide. 

30 114. The method of claim 101, wherein: 

said first modified nucleotide is covalently bonded at its 5' position to a first 
oxygen atom of a phosphorothioate group; 
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said second modified nucleotide is substituted at its 2' position with one or 
two fluorine atoms; and, 

said second modified nucleotide is covalently bonded at its 3' position to a 
second oxygen of said phosphorothioate group. 

5 

115. The method of claim 1 14, wherein cleaving said modified 
polynucleotide into fragments comprises: 

a. contacting said modified polynucleotide with ethylene sulfide or 
p-mercaptoethanol; and then, 

10 b. contacting said modified polynucleotide with a chemical base. 

1 16. The method of claim 1 1 5, wherein said chemical base comprises 
sodium methoxide. 

15 117. The method of claim 1 0 1 , wherein: 

said first modified nucleotide is covalently bonded at its 5" position to a first 
oxygen atom of a phosphorothioate group; 

said second modified nucleotide is substituted at its 2' position with a hydroxy 
group; and, 

20 said second modified nucleotide is covalently bonded at its 3' position to a 

second oxygen of said phosphorothioate group. 

1 1 8. The method of claim 117, wherein cleaving said modified 
polynucleotide into fragments comprises: 
25 a. contacting said modified polynucleotide with a metal oxidant; 

and then, 

b. contacting said modified polynucleotide with a chemical base. 



119. The method of claim 118, wherein said metal oxidant is selected from 
30 the group consisting of Cu" and Fe'". 
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120. The method of claim 118, wherein said chemical base is selected from 
the group consisting of dilute hydroxide, piperidine and dilute ammonium hydroxide. 

121. The method of claim 101, wherein: 

5 said first modified nucleotide is covalently bonded at its 5' position to a 

nitrogen atom of a phosphoramidate group; and, 

said second modified nucleotide, which is modified with a 2'-hydroxy group, is 
contiguous to and 5' of said first modified nucleotide. 

10 122. The method of claim 121 , wherein cleaving said modified 

polynucleotide comprises contacting said modified polynucleotide with acid. 

1 23. The method of claim 1 01 , wherein: 

said first modified nucleotide is covalently bonded at its 3' position to a 
15 nitrogen atom of a phosphoramidate group; and, 

said second modified nucleotide, which is modified with a 2'-hydroxy group, is 
contiguous to and 3' of said first modified nucleotide. 

124. The method of claim 123, wherein cleaving said modified 

20 polynucleotide into fragments comprises contacting said modified polynucleotide 
with acid. 

1 25. The method of claim 101, wherein: 

said first modified nucleotide is covalently bonded at its 5' position to an 
25 oxygen atom of an alkylphosphonate or an alkylphosphorotriester group; and, 

said second modified nucleotide, which is modified with a 2-hydroxy group, is 
contiguous to said first modified nucleotide. 

126. The method of claim 125, wherein cleaving said modified 

30 polynucleotide into fragments comprises contacting said modified polynucleotide 
with acid. 
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127. The method of claim 101, wherein: 

said first modified nucleotide has an electron-withdrawing group at its 4' 
position; and, 

said second modified nucleotide, which is modified with a 2'-hydroxy group, is 
contiguous to and 5' of said first modified nucleotide. 

128. The method of claim 127, wherein cleaving said modified 
polynucleotide into fragments comprises contacting said modified polynucleotide 
with acid. 

129. A compound having the chemical structure: 
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o o o o o o 
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wherein R 1 is selected from the group consisting of: 






R 2 is selected from the group consisting of cytosine, guanine, inosine and uracil; 
and, "Base" is selected from the group consisting of cytosine, guanine, inosine, 
thymine and uracil. 



10 
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130. A polynucleotide comprising a dinucleotide sequence selected from the 
group consisting of: 
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wherein 

5 each "Base" is independently selected from the group consisting of adenine, 
cytosine, guaninine and thymine; 
W is an electron withdrawing group; 
X is a leaving group; and, 
R is a lower alkyl group; wherein, 
1 0 a second W or X shown on the same carbon atom represents that a single W or X 
can be in either position or both Ws or Xs can exist simultaneously; 

1 31 . The compound of claim 1 30, wherein said electron withdrawing group 
is selected from the group consisting of F, CI, Br, I, N0 2 , C=N, -C(0)OH and OH. 

15 

132. The compound of claim 130, wherein said leaving group is selected 
from the group consisting of CI, Br, I and OTs. 

133. A method for synthesizing a polynucleotide comprising mixing a 
20 compound having the chemical structure: 
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0 O O 




OH 



wherein R 1 is selected from the group consisting of: 




with adenosine triphosphate, guanosine triphosphate, and thymidine or uridine 
triphosphate in the presence of one or more polymerases. 

134. A method for synthesizing a polynucleotide comprising mixing a 
compound having the chemical structure: 

o o o 
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wherein R 1 is selected from the group consisting of: 




with adenosine triphosphate, cytidine triphosphate and guanosine triphosphate in 
the presence of one or more polymerases. 

135. A method for synthesizing a polynucleotide, comprising mixing a 
compound having the chemical structure: 

0 0 0 




wherein R 1 is selected from the group consisting of: 




with cytidine triphosphate, guanosine triphosphate, and thymidine triphosphate in 
the presence of one or more polymerases. 
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136. A method for synthesizing a polynucleotide, comprising mixing a 
compound having the chemical structure: 
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wherein R 1 is selected from the group consisting of: 




with adenosine triphosphate, cytidine triphosphate and thymidine triphosphate in the 
1 0 presence of one or more polymerases. 

137. A method for synthesizing a polynucleotide, comprising mixing a 
compound of claim 129 with whichever three of the four nucleoside triphosphates, 
adenosine triphosphate, cytidine triphosphate, guanosine triphosphate and 

15 thymidine triphosphate, do not contain a base (or its substitute) present in the 
compound of claim 129 used, in the presence of one or more polymerases. 

138. A method for synthesizing a polynucleotide, comprising mixing one of 
the following pairs of compounds: 

20 
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wherein: 

Base! is selected from the group consisting of adenine, cytosine, guanine or inosine, 
5 and thymine or uracil; 

Base 2 is selected from the group consisting of the remaining three bases which are 
not Base 1; 

R 3 is 0--P(=0)(Cr )-0-P(=0)(0 )-0-P(=0)(0 )-0-; 
R is a lower alkyl group; 
10 W is an electron-withdrawing group; 
X is a leaving group; wherein 

a second W or X shown on the same carbon atom represents that a single W 
or X can be in either position or both Ws or Xs can exist simultaneously; 
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with whichever two of the four nucleoside triphosphates, adenosine triphosphate, 
cytidine triphosphate, guanosine triphosphate and thymidine triphosphate, do not 
contain base-1 or base-2 (or their substitutes), in the presence of one or more 
polymerases. 

139. A mutant polymerase which is capable of catalyzing the incorporation 
of a modified nucleotide into a polynucleotide wherein said modified nucleotide is not 
a ribonucleotide by a process comprising DNA shuffling. 

140. The polymerase of claim 139, wherein said DNA shuffling process 
comprises: 

a. selecting one or more known polymerase(s); 

b. performing DNA shuffling; 

c. transforming shuffled DNA into a host cell; 

d. growing host cell colonies; 

e. forming a lysate from said host cell colony; 

f. adding a DNA template containing a detectable reporter 
sequence, the modified nucleotide or nucleotides whose incorporation into a 
polynucleotide is desired and the natural nucleotides not being replaced by said 
modified nucleotide(s); and, 

g. examining the lysate for the presence of the detectable reporter. 

141 . The polymerase of claim 139, wherein said DNA shuffling process 
comprises: 

a. selecting a known polymerase or two or more known 
polymerases having different sequences or different biochemical properties or both; 

b. performing DNA shuffling; 

c. transforming said shuffled DNA into a host to form a library of 
transformants in host cell colonies; 

d. preparing first separate pools of said transformants by plating 
said host cell colonies; 

e. forming a lysate from each said first separate pool host cell 

colonies; 
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f. removing all natural nucleotides from each said lysate; 

g. combining each said lysate with: 

i. a single-stranded DNA template comprising a sequence 
corresponding to an RNA polymerase promoter followed by a reporter 
sequence; 

ii. a single-stranded DNA primer complementary to one end 
of said template; 

iii. the modified nucleotide or nucleotides whose 
incorporation into said polynucleotide is desired; 

iv. each natural nucleotide not being replaced by said 
modified nucleotide or nucleotides; 

h. adding RNA polymerase to each said combined lysate; 

i. examining each said combined lysate for the presence of said 
reporter sequence; 

j. creating second separate pools of transformants in host cell 
colonies from each said first separate pool of host cell colonies in which the 
presence of said reporter is detected; 

k. forming a lysate from each said second separate pool of host 

cell colonies; 

I. repeating steps g, h , I, j, k and I to form separate pools of 
transformants in host cell colonies until only one host cell colony remains which 
contains said polymerase; and, 

m. recloning said polymerase from said one host cell colony into a 
protein expression vector. 

142. A mutant polymerase which is capable of catalyzing the incorporation 
of a modified nucleotide into a polynucleotide, wherein said modified nucleotide is 
not a ribonucleotide obtained by a process comprising cell senescence selection. 

143. The polymerase of claim 142, wherein said cell senescence selection 
comprises: 
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a. mutagenizing a known polymerase to form a library of mutant 

polymerases; 

b. cloning said library into a vector; 

c. transforming said vector into host cells selected so as to be 
5 susceptible to being killed by a selected chemical only when said cell is actively 

growing; 

d. adding a modified nucleotide; 

e. growing said host cells; 

f. treating said host cells with said selected chemical; 
10 g. separating living cells from dead cells; and, 

h. isolating said polymerase or polymerases from said living cells. 

144. The polymerase of claim 143, wherein steps d to g are repeated one or 
more times. 

15 

145. The polymerase of claim 142, wherein said process comprises: 

a. mutagenizing a known polymerase to form a library of mutant 

polymerases; 

b. cloning said library of mutant polymerases into a plasmid vector; 
20 c. transforming with said plasmid vector bacterial cells that, when 

growing, are susceptible to an antibiotic, 

d. selecting transfectants using said antibiotic; 

e. introducing a modified nucleotide, as the corresponding 
nucleoside triphosphate, into the bacterial cells; 

25 f. growing the cells; 

g. adding an antibiotic which will kill bacterial cells that are actively 

growing; 

h. isolating said bacterial cells; 

i. growing said bacterial cells in fresh medium containing no 

30 antibiotic; 

j. selecting live cells from growing colonies; 

k. isolating said plasmid vector from said live cells; 
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I. isolating said polymerase; and, 
m. assaying said polymerase. 



146. The polymerase of claim 145, wherein steps c to k of the process are 
5 repeated one or more additional times before proceeding to step I. 

147. The polymerase of claim 139, wherein said polymerase is a heat stable 
polymerase. 

10 148. A mutant polymerase which is capable of catalyzing the incorporation 

of a modified nucleotide into a polynucleotide, wherein said modified nucleotide is 
not a ribonucleotide obtained by a process comprising phage display. 



149. The mutant polymerase of claim 148, wherein said phage display 
15 comprises: 

(a) selecting a DNA polymerase; 

(b) expressing said polymerase in a bacteriophage vector as a fusion to a 
bacteriophage coat protein; 

(c) attaching an oligonucleotide to the surface of said phage; 

20 (d) forming a primer template complex either by addition of a second 

oligonuclotide complementary to the oligonucleotide of (c) or by formation of a self- 
priming complex using intramolecular complementarity of the oligonucleotide of (c); 

(e) performing a primer extension in the presence of a modified nucleotide 
or nucleotides and those natural nucleotides that are not being replaced by said 

25 modified nucleotide(s), one of said natural nucleotides being labeled with a 
detectable reporter; and, 

(f) sorting the phage with detectable reporter from the phage without 
detectable reporter. 



30 



150. The polymerase of claim 139, 142 or 148, wherein said modified 
nucleotide is selected from the group consisting of: 
a compound having the chemical structure: 
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a compound having the chemical structure: 
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wherein said "Base" is selected from the group consisting of adenine, cytosine, 
guanine, inosine and uracil; 

5 

a compound having the chemical structure: 
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wherein said "Base" is selected from the group consisting of adenine, cytosine, 
guanine, inosine, thymine and uracil; and, 

a compound having the chemical structure: 
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151. A kit, comprising: 

one or more modified nucleotides; 

one or more polymerases capable of incorporating said one or more modified 
nucleotides in a polynucleotide to form a modified polynucleotide; 
10 a reagent or reagents capable of cleaving said modified polynucleotide at 

each point of occurrence of said one or more modified nucleotides in said 
polynucleotide. 



15 
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Figure 1 & 2 
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Figure 3 




Molecular size markers 




Piperidine cleavage products 
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Figure 4 
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Figure 5 





& 5 

S .2 

2 S 

— > 

I I 



I 



1 1 1 1 1 

~ 1 £ 1 S 



3 

.2 £ g, a I j « 



2 2 



•rr «m wj 

u •§ «E fa - » - 



y ill Sj 



•a C S J 5= 
«.'§ | g | fr 



g g o a 



= 8.- 

0 co 

1 2 -8 5 d * £ 

I -si a 1 1 a 

E - c c .a - c u 

° CO 



■7 ^ f ■§ .s o 

~ g 2 ll § 

3 P. P O £ {H 



2 



O U "55 



U C = 



0 a> « 5 vj > H 
•55 H .5 J> g J o 

gJj| j J3 I 
§=i.s g*"tf J 

u. -c 5 s -g ° 
< > _ 

J" s •§ a J x 

1 S J « 2 S "3 



lit 



CU) 



4 / 34 



WO 00/18967 



PCT/US99/22988 




5 / 34 



WO 00/18967 



PCT/US99/22988 



Figure 7 
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Figure 10 
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Figure 11 A 
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Figure 11 B 
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Figure 12 
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Figure 13 
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